7. Introduction to Ruby Gems

Overview

By the end of this chapter, you will be able to import external data and code to improve the functionality of an application; use Ruby gems in programs; interact with file systems and file modes in Ruby; read and write files to and from disk using Ruby; import and export external CSV data into Ruby applications and use service objects to package code for reuse within applications.

Introduction

In the previous chapter, we learned about code reusability and how to clean up our code base by extracting common functionality and logic from modules that can be included as needed throughout our project, preventing unnecessary code duplication.

This is an important concept to grasp as it forms the base for Ruby's excellent package management system known as RubyGems, which we will dive into further in this chapter.

Most applications consist of inputs and outputs. Facebook will have data in the form of photos and status updates (as input), and users, in turn, will see other users' photos and status updates (as output). Additionally, a banking application will load data from a database (as input) and present it to the user in the form of charts and tables (as output). The input data sources will vary per application, but the concept of inputs and outputs is essentially the same.

Data is fed into the application, some sort of processing is performed, and there is an output action, be it saving to a database, exporting data to another format, or simply printing out a processed version of the input.

A common scenario in the workplace is the need to process data in the form of

comma-separated values (CSV) that may have been imported from another system. Following this, some sort of processing is then performed on the values and, finally, a result is outputted to the user in a way that helps them to understand the data.

In this chapter, we will look at handling this exact type of scenario. We will look at importing and exporting CSV data, processing it, and then outputting a result using an external library that's going to format the data into a nice readable table for us.

We will also take a closer look at RubyGems, how we can interact with the package manager, and how to utilize external gems in our own code base. We'll then wrap everything up by implementing everything we've learned about as a service object in our code.

RubyGems and the require Method

Similar to the concept of including modules, Ruby has another way of including external code into your project, which is known as a gem. Essentially, a Ruby gem is a package of code that can be included in your project, much like a module, with a few key differences such as the ability to version a particular package and the ability to load other dependent gems at the same time.

Generally speaking, a gem is more of a collection of modules and classes than a single module or class. Gems can be tiny and can solve a single problem, such as formatting screen output, or they can be an entire application framework. The Ruby on Rails framework, for example, is a gem itself.

Most modern languages have an equivalent way of loading external code packages into an application. These are commonly referred to as package managers.

For Node.js, you would use npm or yarn; for Python, you would use PIP; for C#, you would use NuGet; and for Ruby, we use RubyGems.

So, why would we want to include other external code and libraries in our own code base? Well, quite simply, to save us time and effort. Consider the following examples.

You're building a new application and you want to allow for user registration so that customers can sign in to your website. Creating a robust user authentication and registration system is no small task. You would probably need to answer the following questions before you begin:

  • Do you understand cryptography well enough to implement a secure password hashing algorithm?
  • What about allowing users to reset their password?
  • How about sending a user a confirmation email on sign up?
  • What if you want to allow people to sign in with Facebook or Twitter?

You could write these yourself, but it would take a lot of time and effort and you'll more than likely make mistakes that could compromise the security of your application. Thankfully, with RubyGems, we can simply include the devise gem (https://packt.live/318fy8k) into our project and have a fully featured authentication and user registration system that solves our problems in a matter of minutes.

But why stop at user authentication? Let's say our application needs to upload files to a remote location; well, we can just add the carrierwave gem (https://packt.live/33nzOV2).

What if you need to paginate the results of your web page? In that case, you just add the will_paginate gem (https://packt.live/2VuQuYa).

You can begin to see how we can create a very functional application in no time at all by leveraging this external code in the form of gems. This allows us to focus on what our core application functionality needs to do, rather than the standard functionality that we all expect from any application, such as being able to sign up and log in.

Now let's take a look at how we can interact with Ruby gems. The following are the gem functions we are going to study next:

  • gem search
  • gem install
  • gem dependency
  • gem Versioning
  • gem list

Let's take a look at how we can interact with each of these Ruby gems, one by one.

gem search

gem search is used to search for available gems by name. Run the following command on the Terminal:

$ gem search terminal-table

The output would be as follows:

Figure 7.1: Output for gem search

Figure 7.1: Output for gem search

Note

Search for locally installed gems by passing the --local flag or search only for remote gems by passing the --remote flag, to the gem search command.

gem install

As you might expect, gem install will install a gem. It does so when you simply pass in the name of a gem to the command. By default, it will install the latest version of the gem:

$ gem install terminal-table

To install a specific version of a gem with the --version flag, use the following command:

$ gem install terminal-table --version

The output would be as follows:

Figure 7.2: Output for gem install

Figure 7.2: Output for gem install

As you can see, there are two gems installed – terminal table and unicode-display-width. The unicode-display-width gem is a dependency, which simply means that the creators of the terminal-table gem are using (much like including) the unicode-display-width gem in their own code and have listed it as a dependent gem in their own gem's definition.

RubyGems is intelligent enough to figure out all of these dependency chains and install them as required. As you can imagine, gems depending on other gems can go many levels deep, which would be a pain to manage yourself. Thankfully, we don't need to think about that with RubyGems.

gem dependency

As we saw previously, gems can have dependencies on other gems. You can view a gem dependency with the gem dependency command as follows:

$ gem dependency terminal-table

The output would be as follows:

Figure 7.3: Output for gem dependency

Figure 7.3: Output for gem dependency

You'll notice that there are several gems listed here; however, all but one are assigned to the development group. This is essentially saying that those gems are only required in the development environment for the terminal-table gem.

By default, these will not be installed when installing the terminal-table gem, as only a gem's non-development dependency is installed by default.

gem Versioning

In the following command, notice the numbers with the arrows next to the gem names in some of the gem commands:

Figure 7.4: Output for gem versioning

Figure 7.4: Output for gem versioning

These are known as semantic versioning constraints. They essentially tell RubyGems what range of versions of that gem is required to be installed. RubyGems is able to read these numbers and select a compatible version of that gem to download and install.

You can translate them to mean "I need a version equal to or greater than version 1.10," or "I need a version that is no older than version 3.0."

Note

More information on how this works can be found at https://packt.live/2IF9wWq.

The ~> symbol in the semantic versioning constraints is called a twiddle-wakka.

gem list

gem list lists locally installed gems. It is extremely helpful when we are trying to understand what gems and their versions are installed:

Figure 7.5: Output for gem list

Figure 7.5: Output for gem list

You'll notice that there are a lot of gems listed even though you may have only installed one. The gems with default: included in the brackets are part of the Ruby core library. They come installed with Ruby and cannot be removed.

You can see from the preceding output that the two gems at the bottom are the ones that we installed previously and do not have the default: label attached to them.

Note

Depending on the version of Ruby you are using, this gem listing may appear differently. Newer versions of Ruby may include additional default gems.

Using Gems in Your Code

To use a gem, you simply require the gem in your code, which is similar to how you would include a module. Consider the following example:

user = { name: 'John Smith', age: '35', address: { home: '1 kings cross road' }}

puts JSON.pretty_generate(user)

# NameError (uninitialized constant JSON)

require 'json'

puts JSON.pretty_generate(user)

{

  "name": "John Smith",

  "age": "35",

  "address": {

    "home": "1 kings cross road"

  }

}

In the preceding example, we create a simple hash containing some user information and we attempt to convert it to JSON and display it in a formatted way using the JSON.pretty_generate function.

We can then see that it throws a NameError error because the JSON gem has not been required and is, therefore, not available. This is very much like trying to use a method from a module before you've included it.

In the preceding lines of code, we require the JSON gem (which is a default Ruby gem). Then, we call the JSON.pretty_generate method again; we can see it now works and formats our hash into a pretty JSON format.

It really is that simple to load other libraries into our code and extend the functionality of our application.

So, we've learned what Ruby gems are now and how they can be used to extend the functionality of our application by simply "requiring" them in our code. Let's try it out for ourselves now.

In the following exercise, we'll learn how to use a Ruby gem to format and present a basic data structure in our Terminal windows in a readable format. Creating a neatly presented table of information in a Terminal window is a tricky task; it's not something we would want to repeat for every project, so let's make it easier for ourselves and use one that has already been built.

Exercise 7.01: Installing and Using a Ruby Gem

In this exercise, we will be installing a Ruby gem, terminal-table, to generate a table of individuals and their locations and then print it.

The following steps will help you to complete this exercise:

  1. Install the terminal-table gem. From your Terminal, run the following command:

    gem install terminal-table

  2. Create a exercise_1.rb script that will require the terminal-table gem. Generate a collection of users and print them to the Terminal as a table:

    require "terminal-table"

    headings = ["Name", "City"]

    users = [

            ["James", "Sydney"],

            ["Chris", "New York"]

    ]

    table = Terminal::Table.new rows: users,  headings: headings

    puts table

  3. Run the script with the following command:

    ruby exercise_1.rb

    You should see a table printout of our users with a heading row, as follows:

    Figure 7.6: Table using a Ruby gem

Figure 7.6: Table using a Ruby gem

Thus, we have successfully represented data in a tabular form, using the

terminal-table gem.

File I/O

The ability to open, read, and write from the filesystem is an important part of any language. Thankfully, Ruby has quite an extensive and user-friendly file I/O interface.

The IO class is responsible for all input and output operations in Ruby. The File class is a subclass of the IO class:

File.superclass

=> IO

When we interact with the filesystem, we are generally always working with the File class, although it is helpful to understand where it sits in the class hierarchy.

Let's take a look at some common file operations:

  • Creating files
  • Reading from files
  • Writing to files

Creating Files

We can create new files by instantiating a File object and passing the name of the file and the file mode to the initializer:

file = File.new("new.txt", "w")

=> #<File:new.txt>

file.close

When we create or open files using the File.new method, we also need to call close afterward to tell Ruby to release the handle it has opened for the file. When using the File.open method with a block, close is automatically called for us. We shall discuss this in more detail later on in the chapter.

You might be wondering what that w parameter that appears after the filename is. This tells Ruby what mode we want to use. In this example, we've set the mode to w, which means we want to write to a new file. By default, unless we supply this parameter, the file mode will be r, which is short for READ. The READ file mode is only for, as you may have guessed, reading files.Attempting to create a new file using this mode will give you an error like the following:

Figure 7.7: An ENOENT error

Figure 7.7: An ENOENT error

We will cover file modes more extensively later.

Reading Files

Reading the contents of files is quite a simple process with Ruby. There are a few different methods for reading and processing files:

  • Using the File.read method
  • Using the File.open method
  • Using the File.foreach method

We'll use a test file named company.txt for this section that contains the

following content:

ACME Company

555 Mystery Lane

2010

Let's take a look at each of these file reading methods in turn.

Using the File.read Method

The File.read method will read the whole file into memory at once and handle

the file just like a large string:

File.read("company.txt")

=> "ACME Company 555 Mystery Lane 2010"

We see that the entire file is loaded into memory and returned as a single string.

The newline characters are represented as in the string.

Some potential issues may arise from the File.read method. When it comes to the loading of a large file into memory, it can be an inefficient method. In such cases, we can prompt for a more optimized solution for larger files, which leads into the next section of using File.open and File.foreach.

Using the File.open Method

The File.open method on its own simply returns an instance of the File class indicating we have an open file handle on the company.txt file:

File.open("company.txt")

=> #<File:company.txt>

Passing a block to the File.open method, however, allows us to iterate over the contents of the file one line at a time and process the contents. The block will automatically close the file when it exits:

File.open('company.txt').each do |row|

  puts row

end

The output would be the following:

ACME Company

555 Mystery Lane

2010

Using the File.foreach Method

Much like the File.open().each method, we can use the slightly more succinct File.foreach method, which does essentially the same thing without the need to specifically call .each:

File.foreach('company.txt') do |row|

  puts row

end

The output would be the following:

ACME Company

555 Mystery Lane

2010

read versus open versus foreach

You could easily be forgiven for being confused about these methods that seemingly do the exact same thing. From an end user's point of view, that's true; however, from a programming perspective, they are quite different.

The File.read method will load the entire contents of a file into memory for us to process at once. This may be suitable for smaller files, but for anything larger, this can have a serious impact on your system and application performance.

The File.open method with a block and the File.foreach method, however, process the contents of a file one line at a time. This allows Ruby to manage memory more efficiently, and they are generally a safer option for when your files are of varying sizes.

We will cover more on this topic of loading and processing data one row at a time in the Handling CSV Data section.

Note

Loading external data can have a detrimental effect on application performance if it is not done correctly. Understanding how data is being allocated and cleaned up by Ruby is a key factor in ensuring consistent performance.

Writing to Files

There are several ways to write to files in Ruby, each with a slightly different use case:

  • Using the File.new method
  • Using the File.open method with a block method
  • Using the File.write method

Let's take a look at each of them.

Using the File.new Method (Initializer)

File.new will return an instance of a file with an open file handle, which means we are able to write to it:

file = File.new("new.txt", "w")

file.puts "Hey, nice file"

file.close

Calling puts on the file object here writes the string to the file, although you'll still need to call close on the object before you can access the contents of the file from outside the application.

Note

You can actually call file.puts, file.write, and file << "my string" to write to the File object and you'll get the same result.

Using the File.open Method with a Block Method

This method allows us to pass a block with the instantiated file object as a parameter. This is a cleaner approach that allows you to create, write, and close a file all in one block of code:

File.open("new.txt", "w") do |file|

  file.write("Hey, nice file")

end

The section after the open statement is a Ruby block and it allows us to encapsulate our write logic into a section of code that, when completed, will automatically close the file so we don't need to call file.close manually. It looks much cleaner.

Using the File.write Method

File.write("new.txt", "Hey, nice file")

The File.write method is more of a shorthand syntax. The write method is actually a member of the parent IO class and not the File class. It is a quick and short method for opening a file, writing a string to a file, and closing the file with the smallest amount of code. The length of the characters written is returned by the write method rather than a file handle.

File.open versus File.new

You will see both of these methods used in examples online. It mostly comes down to personal preference, but File.open is the more useful method due to the fact that it supports the ability to pass in a block and immediately iterate over a collection, writing out the results to a file before automatically closing the file when the block exits.

There are times, however, when you may wish to open a file and pass the file reference to another method for processing before closing the file. In this case, you may prefer to use File.new over File.open.

Both methods will return an instance of File (File.open is used only when called without a block) and can be used to pass a reference to the open file around the rest of the code base. However, using File.new for this specific use case and File.open only with blocks can help to make your code easier to understand, as other engineers will know that whenever they see a File.new method, there needs to be a corresponding .close method call for the instantiated object.

File Modes

We've seen the usage of file modes in the previous examples. To put it simply, they tell Ruby how much access we want to enable for the file we're going to interact with.

There are several file modes that we can choose from depending on the requirement we have for the file. The most common usages are to "read a file mode", r, and to "write a file mode", w.

The following table is from the official Ruby documentation from the IO class and explains the meaning behind each of the modes:

Figure 7.8: File modes

Figure 7.8: File modes

File Helpers

Aside from reading and writing data, Ruby comes with a bunch of very helpful file helpers right out of the box. These can help you to solve a number of common file-related tasks, such as checking for the existence of files and permissions and deleting files. Here are a few examples of useful file helpers in Ruby.

File.exist?

This checks for the existence of a file. It returns true or false. Use this before creating a new file or opening an existing one to ensure the file exists and to avoid throwing errors.

Dir.exist?

This checks for the existence of a directory. It returns true or false. It is very much like File.exist? except for the directories. Use this before creating a new directory to ensure the file exists and avoid throwing errors.

File.delete

This deletes a file when given a file path.

File.size?

This returns the size of a file. You may wish to use this to report on the size of datasets after importing them or to verify the size of a file before processing it.

File.truncate

This truncates (that is, clears) the contents of a file. You can use this to reset a file's content back to empty before writing to it. It is helpful if you wish to reuse a particular file.

File.zero?

This returns true if the file is empty. Use this when you want to verify whether a file has any content or not.

File.birthtime

This returns the birth time (that is, the time of creation) of a file. Use this when you want to know how old a file is.

Note

You can read more about all the available file options in the official Ruby documentation for the File class here: https://packt.live/35nehxE.

Handling CSV Data

CSV is a very common format for representing tabular data. It is an easily parsable data format to work with and it can be opened in all common spreadsheet applications, such as Microsoft Office and Google Docs, with no need for conversion.

We can represent columns and rows with CSV, much like a relational database or a spreadsheet, which makes it a very handy tool for processing exported database records, generating data to be imported into a database, or creating spreadsheets.

Ruby comes with a full library for handling CSV data out of the box. The Ruby CSV library is actually a gem and is part of the Ruby default gem set. This means that to use the CSV library in your code, you simply need to "require" it.

We can see the csv gem with the following gem list command:

$ gem list | grep csv

csv (default: 1.0.0)

Ruby has even published this gem publicly on GitHub (https://packt.live/35qKCUf), just like any other gem.

All modern versions of Ruby (1.9 and later) will have a default CSV gem. The Ruby CSV gem from Ruby 1.9.3 and later is actually based on a popular CSV parsing gem called FasterCSV. This was an optional replacement for the core CSV gem before Ruby 1.9.3; however, as it was so popular, the Ruby team simply replaced the default CSV gem with FasterCSV as the default CSV gem.

There are other CSV gems out there if you have more specific needs. SmarterCSV is another well-known replacement for the default CSV gem that offers parallel import processing for the better handling of larger files.

Note

You can refer to the following for more information on SmarterCSV: https://packt.live/2B5xSVk.

Similar to the File class we covered previously, the CSV gem has a number of different ways in which we can interact with CSV data and files. Let's take a closer look.

Reading Data from a CSV File Using CSV.read

The simplest way to load CSV data is with the CSV.read method. This is an "all-at-once" method that will load the entire CSV document at once into memory and return an array of arrays representing the data.

Let's imagine that we have a users.csv file containing the following CSV data:

Mike Smith,35,Sydney

James Taylor,42,New York

Susan Jones,29,San Francisco

We can read the data with the following:

CSV.read("users.csv")

This returns the following:

=> [["Mike Smith", "35", "Sydney"], ["James  Taylor", "42", " New York"], ["Susan Jones", "29", "San Francisco"]]

Here, we can see that the data has been loaded into an array. We can also see that each row in the CSV data has been represented as an array inside the main array, so we have a nested array or an array of arrays.

We can iterate over this array to get access to the individual rows:

require 'csv'

users = CSV.read("users.csv")

users.each do |user|

  puts "name: #{user[0]}"

  puts "age: #{user[1]}"

  puts "city: #{user[2]}"

end

This returns the following:

name: Mike Smith

age:  35

city:  Sydney

name: James Taylor

age:  42

city:  New York

name: Susan Jones

age:  29

city:  San Francisco

This is great. We're able to load data from an external file and interact with it in Ruby with only a few lines of code.

But what kind of problems do you think will arise if we have a wide dataset with many columns or fields? Well, accessing the data with array index positions can get confusing when there are many columns.

The numbers in the square brackets after the user variable are index positions. As we can see from the preceding array-of-arrays example, each element of the array is another array containing the user information. For each user array, we see that index position 0 is the name, index position 1 is the age, and index position 2 is the city.

As you can imagine, if you have a dataset with many columns or fields, this can get confusing, as you need to keep track of exactly which column is at which index. For example, is user[18] the address or is it user[12]? Not only is this a bit confusing, but it makes our code hard to read. Wouldn't it be nicer if we could access the data based on the actual field name?

Using Column Headers as Field Names

Thankfully, Ruby makes it easy to refer to row data by the field name, which makes our code cleaner and easier to read.

If, instead, our previous dataset included a header row, the output would be as follows:

name,age,city

Mike Smith,35,Sydney

James Taylor,42,New York

Susan Jones,29,San Francisco

We can simply pass in the headers parameter to the read method:

headers: true

We can now use the heading name to access the row data:

require 'csv'

users = CSV.read("users_with_headers.csv", headers: true)

users.each do |user|

  puts "name: #{user["name"]}"

  puts "age: #{user["age"]}"

  puts "city: #{user["city"]}"

end

This returns the following output:

name: Mike Smith

age: 35

city: Sydney

name: James Taylor

age: 42

city: New York

name: Susan Jones

age: 29

city: San Francisco

Much better. How does that work, though? How do you access an array position with a string?

The simple answer is that you don't. Ruby is performing some magic behind the scenes here when the headers parameter is supplied, and, instead of returning an array of arrays as we saw previously, it is returning an instance of the CSV::Table class:

require 'csv'

CSV.read("users_with_headers.csv", headers: true)

=> #<CSV::Table mode:col_or_row row_count:4>

The core Ruby documentation describes the CSV::Table class (https://packt.live/2OKCIPT) as follows:

"A CSV::Table is a two-dimensional data structure for representing CSV documents. Tables allow you to work with the data by row or column, manipulate the data, and even convert the results back to CSV, if needed."

This simply means that it's a more flexible representation of the dataset than simply an array of arrays. It allows you to interact with the data in different dimensions, be it by row or by column.

Note

While both usability and reliability are increased, there is a performance cost of converting the dataset into hashes and the CSV::Table and CSV::Row types.For example, we can retrieve the first row with the by_row method, as shown in the following code block:

users = CSV.read("users_with_headers.csv", headers: true)

users.by_row[0]

=> #<CSV::Row "name":"Mike Smith" "age":"35" "city":"Sydney">

This returns an instance of the CSV::Row class, which has its own set of helper methods.

We can just as easily return the first column of all rows using the by_col method.

To retrieve the values of the first column and return them as an array, we can do the following:

users = CSV.read("users_with_headers.csv", headers: true)

users.by_col[0]

=> ["Mike Smith", "James Taylor", "Susan Jones"]

Only use the CSV.read method when you're dealing with small datasets. Loading large CSV files into memory can cause performance issues and result in excessive resource consumption. We'll discuss the usage of CSV.foreach for handling larger datasets in a moment.

Exercise 7.02: Reading Data from a .csv File and Printing a Few Columns

In most cases, we are only interested in a part of the data and not the entire dataset. In this exercise, we will be obtaining only the city column from exercise_2.csv, which contains other columns too, such as name and gender. To do so, perform the following steps:

  1. Download exercise_2.csv from the code bundle. It should contain the following content:

    name,age,city

    Mike Smith,35,Sydney

    James Taylor,42,New York

    Susan Jones,29,San Francisco

  2. Create a new file named exercise_2.rb, and then load the CSV data into a users variable:

    require 'csv'

    users = CSV.read("exercise_2.csv", headers: true)

  3. Add the code to retrieve a listing of cities from the users object and then print them out:

    cities = users.by_col["city"]

    puts cities

  4. Run the script to see the listing of cities:

    ruby exercise_2.rb

    You should see a response as follows:

    Figure 7.9: Reading CSV data

Figure 7.9: Reading CSV data

Reading Data from a .csv File Using CSV.foreach

Previously, we have been using the CSV.read method to load all CSV data at once into memory before working with it. However, if you're working with a large dataset, this may cause resource consumption issues.

Processing the data row by row will allow Ruby to manage the memory usage more efficiently on your machine and is generally considered a more idiomatic approach. This can be done by using the CSV.foreach method.

Consider the following example:

require 'CSV'

CSV.foreach('users.csv') do |user|

  puts "name: #{user[0]}"

  puts "age: #{user[1]}"

  puts "city: #{user[2]}"

end

In the preceding example, we call CSV.foreach with the filename, just like we did for CSV.read, which opens the file. However, instead of assigning the result to a variable, we supply a block that we can iterate over in order to print the results, much like what we did earlier by using each to loop over the array.

Once again, we can supply the headers: true parameter to invoke the named key functionality we saw earlier, which provides us with a cleaner interface to retrieve the row data:

require 'csv'

CSV.foreach("users_with_headers.csv", headers: true) do |user|

  puts "name: #{user['name']}"

  puts "age: #{user['age']}"

  puts "city: #{user['city']}"

end

Running both of the preceding foreach examples results in the same response:

name: Mike Smith

age:  35

city:  Sydney

name: James Taylor

age:  42

city:  New York

name: Susan Jones

age:  29

city:  San Francisco

Both CSV.read and CSV.foreach open the files in read-only mode unless specifically set otherwise by the user. As we're only reading the data, the default read-only mode is all we need.

Response Type Variations

We saw previously with CSV.read that, by default, it will return an array of arrays, unless you specify the headers: true parameter, in which case it will return an instance of CSV::Table, which contains a number of CSV::Row instances to represent the row data.

You could be assuming that the CSV.foreach method would work in the same way, but it is actually slightly different again.

The CSV.foreach method returns an instance of Enumerator, which is an internal Ruby class used for iterating over collections. This instance will contain instances of CSV::Row to represent the row data in the same way the CSV.read method did.

Unlike the CSV.read method, however, passing in the headers: true parameter does not change the response type:

irb(main):001:0> require "csv"

=> true

irb(main):002:0> CSV.read("users.csv").class

=> Array

irb(main):003:0> CSV.read("users_with_headers.csv", headers: true).class

=> CSV::Table

irb(main):004:0> CSV.foreach("users.csv").class

=> Enumerator

irb(main):005:0> CSV.foreach("users_with_headers.csv", headers: true).class

=> Enumerator

For the most part, all these slightly different response objects work in a similar way; you can loop over them with an each block, perform basic enumerable functions, and treat them much like you would treat a regular array.

It is important, however, to remember that these objects are not simply arrays and, therefore, you may encounter slightly different methods or operations for interacting with the row data depending on the class.

For example, the CSV::Table class provides the by_col method, whereas the Enumerator and basic Array classes do not:

irb(main):001:0> CSV.read("users.csv", headers: true).respond_to? :by_col

=> true

irb(main):002:0> CSV.read("users.csv").respond_to? :by_col

=> false

irb(main):003:0> CSV.foreach("users.csv").respond_to? :by_col

=> false

The preceding irb log uses the respond_to? method, which simply checks whether an object will "respond to" a particular method name. In other words, it checks whether that method exists on an object or class.

You can imagine that something as simple as changing your CSV import dataset to now include a header row may result in confusion as the response object can change. Simply being aware of these variations can be helpful and save you from having to debug the issue when something unexpected happens.

Writing Data

Writing to a CSV file follows a similar pattern to the CSV.foreach method we learned about previously, except that, here, we use the CSV.open method and supply a block.

Inside the block, we have access to the file, which we can write to by simply calling puts on the opened csv variable. In the same way as before, we do not need to manually call close as the block will automatically do that for us when it exits:

require 'csv'

CSV.open("new_users.csv", "w") do |csv|

  csv.puts ["Sarah Meyer", "25", "Cologne"]

  csv.puts ["Matt Hall", "35", "Sydney"]

end

There is an alternative syntax that you may see other people using that works in the same way and makes use of the append operator instead of the puts method. We do this by using the object << value syntax. Really, puts is just an alias for <<, so you can expect it to work in the same way:

require 'csv'

CSV.open("new_users.csv", "w") do |csv|

  csv << ["Sarah Meyer", "25", "Cologne"]

  csv << ["Matt Hall ", "35", "Sydney"]

end

Both of these examples will generate a new CSV file named new_users.csv with the following content:

Sarah Meyer,25,Cologne

Matt Hall,35,Sydney

The second parameter of the CSV.open command is the file mode parameter we learned about in the File I/O section. The same rules apply here. In our case, we have passed in w, the write file mode parameter, which will create a new file each time, overwriting any previous files with the same name.

We could just as easily open the file with the a mode and append the file data to the end of the existing file if we were building a larger file over time.

Most CSV methods have a similar parameter structure to the file I/O methods. For the most part, they will work in the same way when it comes to closing files.

Writing CSV Data from an Enumerable

In the previous example, our data was static, meaning there was a line of code for each row inserted into the CSV file. In the real world, however, you're not likely to be doing this (as this would be incredibly time-consuming). It's more likely that you would want to export a collection of records from a database or another data source as CSV data.

This may be hundreds or thousands of rows, but the code should still only be a few lines long.

Let's imagine that we have a table of data that contains the names of cities, the country name, and the number of employees in that city. We want to export that data as CSV.

That data is then handed to our code and we need to iterate over it to generate the data. Here's what it might look like:

require "csv"

cities = [

  { name: "San Francisco", country: "United States", employees: 15 },

  { name: "Sydney", country: "Australia", employees: 11 },

  { name: "London", country: "England", employees: 18 }

]

CSV.open("employee_count.csv", "w") do |csv|

  cities.each do |city|

    csv.puts [city[:name], city[:country], city[:employees]]

  end

end

In the preceding example, we define an array of cities, where each city is a hash. We define this array as an example, but, as you can imagine, in a real-world application, this may be a very long list returned from a database table.The output from the preceding code will be as follows:

San Francisco,United States,15

Sydney,Australia,11

London,England,18

The following CSV code is only five lines long and will loop over the entire collection of cities, writing out the data as CSV rows.

We want to highlight here that five lines of code can process thousands of lines of data; we can achieve a very useful result very simply.

Note

There are a number of other pieces of CSV functionality that are available with the standard core Ruby CSV library; you can refer to https://packt.live/2pc66DI for more details.

Exercise 7.03: Writing an Array of Users to a CSV File

Exporting data in CSV format is a great way to retrieve data from your application and share it with other systems or people. In this exercise, we will export an array of users as CSV much like you would export a table from a database in order to process that data with a spreadsheet or another system:

  1. Create a new file, exercise_3.rb, and require the csv gem:

    require "csv"

  2. Create an array of users. Each user will be a hash with the attributes name, age, and city:

    users = [

      { name: "John Smith", age: 36, city: "Sydney" },

      { name: "Susan Alan", age: 31, city: "San Francisco" },

      { name: "Daniel Jones", age: 43, city: "New York" }

    ]

  3. Open a new CSV file for writing and iterate over the users array, writing the CSV content to the file:

    CSV.open("new_users.csv", "w") do |csv|

      users.each do |user|

        csv.puts [user[:name], user[:age], user[:city]]

      end

    end

  4. Execute the application and inspect the contents of the new_users.csv file:

    ruby exercise_3.rb

    You should see the following content:

    John Smith,36,Sydney

    Susan Alan,31,San Francisco

    Daniel Jones,43,New York

Thus, we have successfully used a Ruby gem to write data to a file.

Service Objects

In previous sections, we learned how to install and include Ruby gems to extend the functionality of our application. We also learned how to interact with files and CSV data. We now know how to read, process, and output data to the screen or the filesystem.

This is useful functionality, although opening files and writing CSV data is something that we would generally consider to be a common functionality that is likely to be shared between classes and can also seem unrelated to the existing classes in our code base.

What do we mean by unrelated? Well, let's assume we have a User class and a Company class in our application. We want the ability to load user and company data and print it to the Terminal for both of these classes. So, where do we write the code for this functionality? In the User class? In the Company class? Or, in both classes?

Possible solutions to the common usage code issue are using modules, service classes, and class inheritance.

The answer to the preceding questions is none of the above; neither the User class nor the Company class is the right place for this code. What we need instead is what's known as a service object.

A service object is a class that is created to perform a specific action or set of actions.

A service object is often created in order to DRY (Don't Repeat Yourself) up your code, which simply means it contains code that may be used in multiple different places in our application, but by exposing that code from a single central location, our domain model code stays clean and relevant.

The Single Responsibility Principle

Service objects play an important role in fulfilling the single responsibility principle.

The single responsibility principle is a well-known and established computer science methodology whereby every class or function has responsibility for no more than a single portion of the code base that represents a single responsibility of the application. This responsibility should be entirely encapsulated within the defined class or module.

Simply put, this is a guideline that means a class or module should only contain logic that relates to that specific class or module.

It is important for the following reasons:

  • It makes testing our code much easier as we aren't interacting with a tangled mess of dependent classes.
  • It keeps our classes smaller and more specific. For example, users.rb deals only with users, not groups, email, or payments.
  • It promotes shared functionality. It's a lot easier to reuse code when it doesn't depend on other classes to function.
  • It's easier to debug.
  • It's easier to refactor. The code lives in a single encapsulated location, not scattered around the code base.

    Note

    The single responsibility principle is the "S" portion of the SOLID design principles. For more information, you can refer to https://packt.live/2MAFQen.

With this in mind, we can begin to see why we would want to use service objects in our application.

Service Objects versus Modules

By now, you're probably thinking, "Service objects sound very similar to the modules that we learned about earlier."

Sure, there are similarities between service objects and modules, but they are both fundamentally different concepts at their core.

Both service objects and modules will help keep your code clean and focused, both will DRY up your code base, both will promote code reusability, and both will improve the testability of your application.

So, why do we need both?

A module is used for sharing stateless functions between your code. Stateless functions typically operate on inputs from method parameters and do not assume any state or variables from the enclosing parent class. A stateless function is essentially like a class method; it does not have access to an instantiated objects variable, but merely whatever state is passed to the function when it's called. They are used as "mixins," which extend the functionality of our classes by "mixing in" the functions defined in the module to the class.

Modules are often thought of in a more "functional programming" aspect. They are stateless and declarative. We don't initialize or instantiate modules; this means that we tend to use them more like helpers.

Service objects are simply classes. Classes can be instantiated. Classes can have instance variables and be passed around to other methods. A service object is just an instantiated class with a specific purpose that generally follows a format for calling the code contained in the class.

A simplified example of a service object is as follows:

require "csv"

class CSVPrinter

  def initialize(filepath)

    @filepath = filepath

  end

  def perform

    CSV.foreach(@filepath) do |row|

      puts row.to_s

    end

  end

end

In the preceding example, we can see that a service object is just a Plain Old Ruby Object (PORO). Unlike a module, however, here we have an initialize method that we can use to instantiate an instance of the class with the filepath object passed in and set the instance as a private instance variable.

We then have an instance method called perform, which will perform the work of the service object; in this case, it's just simply printing out the rows of the CSV file.

Calling the perform method is a somewhat common design pattern that is used to standardize the interface for service objects. The words execute, run, and go are also common. As mentioned in the The Single Responsibility Principle section earlier, if we have just one method for executing the logic in the service object, then it helps us to stick to those guidelines. The name of the class is used as the identifier, not the method call.

It's an optional rule that aims to keep our classes simple and clearly defined with a single responsibility; however, it's still just a suggested pattern. You may decide to have a CSVService class, for example, with methods named print and print_table and so on, rather than many individual services with a single perform method.

An example of how you might use this service object would be as follows: CSVPrinter.new("users.csv").perform.

This is the one-liner way of running the CSVPrinter service object. We can see that an instance of the CSVPrinter class is instantiated and we immediately call the perform method on it.

We could, however, store the instantiated class and call perform on it at a later date if need be:

csvprinter = CSVPrinter.new("users.csv")

csvprinter.perform

Let's take a look at an example where a service object is used.

To highlight where you might want to use a service object, let's examine some slightly more involved code that could do with some refactoring:

require "mailgun-ruby"

class User

  attr_accessor :name, :email, :address

  def initialize(name, email, address)

    @name = name

    @email = email

    @address = address

  end

  def create

    if save_to_database(self)

      send_invite_email

    end

  end

  private

  def send_invite_email

    mailgun = Mailgun::Client.new ENV['MAILGUN_API_KEY']

    params =       from: "[email protected]",

      to: email,

      subject: "Welcome #{name}!",

      text: "Thanks for signing up ...."

    }

    mailgun.send_message 'mail.myapp.com', params

  end

end

This is a simplified example that has a single method using the mailgun gem to send an invite email after a user is created. In a real-world application, you tend to find many instances of these small, tightly coupled methods that creep into the application over time. Before you know it, your classes are hundreds of lines long and become difficult to manage.

So, what's actually wrong with doing it this way?

Well, to start with, the send_invite_email method depends on the user instance's email and name attributes to populate the recipient and subject fields, so it's dependent on the instantiated User object's attributes to function.

When you decide to change the email subject or message or add some additional fields or logic, you'll be editing the User class directly, which should really only hold logic regarding the management of User objects, attributes, and resources.

We've also required the mailgun Ruby gem into the Users class as it's a requirement for the send_invite_email method to instantiate the mailgun client in order to send the email.

But do we really need to require an email library in our User class? What about when we decide that we want to send Slack messages, upload avatars to S3, or trigger an SMS to the user? Are we going to require those libraries here too?

You can see how this small class can quickly get out of control.

This code also violates the single responsibility principle by defining email-related methods inside the User class. Even though it is a user-related email, it does not need to live here. You can imagine that the User class of any application is at the very heart of the application and is likely to have a lot of added functionality attached to it in this same way.

Let's refactor this code to use a service object. First, let's extract out the send_email_function function into a service object named UserInviter:

require "mailgun-ruby"

class UserInviter

 def initialize(user)

    @user = user

  end

  def perform

    mailgun = Mailgun::Client.new ENV['MAILGUN_API_KEY']

    mailgun.send_message 'mail.myapp.com', params

  end

  private

  def params

    {

      from: "[email protected]",

      to: @user.email,

      subject: "Welcome #{@user.name}!",

      text: "Thanks for signing up ...."

    }

  end

end

Now let's refactor our User class:

class User

  attr_accessor :name, :email, :address

  def initialize(name, email, address)

    @name = name

    @email = email

    @address = address

  end

  def create

    if save_to_database(self)

      UserInviter.new(self).perform

    end

  end

end

Our User class now only defines what it needs to and, in the future, when we decide to change how our invite emails work or what content they contain, we will just be editing the UserInviter class. We've also moved out the require "mailgun-ruby" statement so it now lives in the UserInviter class, which feels a lot more appropriate as this is a class that specifically deals with sending emails.

It's considered good practice to keep your perform methods short and concise. In our preceding example, we moved the params section into its own private method, which will make testing easier and break down our code into more logical chunks.

Class Method Execution

Ruby is well known for being a highly readable language; it's one of the reasons why developers love the language so much. There is another common design pattern for service objects, which, in true Ruby fashion, sets out to make our service objects just a little bit nicer to work with.

For instance, instead of writing the code in the following way:

UserInviter.new(self).perform

Wouldn't it be cleaner if we could write it as follows? Let's take a closer look:

UserInviter.perform(self)

Now that feels more Ruby-like. Let's refactor our service object class one more time:

service_object_class_method.rb

1  require "mailgun-ruby"

2  

3  class UserInviter

4    def initialize(user)

5      @user = user

6    end

7  

8    def perform

9      mailgun = Mailgun::Client.new ENV['MAILGUN_API_KEY']

10     mailgun.send_message 'mail.myapp.com', params

11   end

12 

13   def self.perform(*args)

14     new(*args).perform

15   end

This looks the same, right? Well, it is, except for the self.perform class method:

  def self.perform(*args)

    new(*args).perform

  end

This little trick defines a class method named perform, which, when called, will simply instantiate a copy of the UserInviter class, pass through the arguments to the initializer, and then execute the instance method version of the perform method.

We've seen now how service objects can help clean up your code and encapsulate logic into their own classes. Let's try it out now for ourselves by creating a simple service object that lists users and their position in the list.

Exercise 7.04: Building a Service Object

In this exercise, we will create a service object that lists our users. To do this, we will instantiate a new instance of the service class with our user list, which establishes our state, and then we'll print our users to the screen using the standard perform syntax:

  1. Create a new exercise_4a.rb file and add the following code. We are essentially introducing a new class to list the users and defining the iterator that will iterate through the list of users and assign index values using the .each_with_index method:

    class UserLister

      def initialize(users)

        @users = users

      end

      def perform

        @users.each_with_index do |user, idx|

          puts "User #{idx}: #{user}"

        end

      end

    end

  2. Create a new file, exercise_4b.rb, with the following code to call our service object:

    require "./exercise_4a"

    users = ['John', 'Susy', 'Sarah', 'James']

    UserLister.new(users).perform

  3. Run the exercise_4b.rb script:

    ruby exercise_4b.rb

    The output should be as follows:

Figure 7.10: Output for a service object

Figure 7.10: Output for a service object

Knowing when to use a module and when to use a service object is more obvious in a real-world application when there are more data objects, functionality, and state objects to think about. Generally, a module is more of a collection of one-time helper functions, such as formatters or converters, whereas service objects can be instantiated with an initial state and can be used to manipulate and mutate data over a series of operations. Instantiated service objects are also available to be passed into methods such as values, whereas modules are "included" in the class with no state.

For example, you may pass an instantiated service object that returns a mutated listing of users into another service object that interacts with the outputted users.

Activity 7.01: Presenting Voting Data in CSV Format Using Ruby Gems

In this activity, we're going to put into practice everything that we've learned in this chapter. We're going to expand our voting program, initiated in Chapter 4, Methods, to allow the importing of external voting data from a CSV file, and we're also going to improve our user experience by using the terminal-table gem to print our vote results to the Terminal in a nice, readable, formatted table.

We will build this new code into service objects so that we don't pollute our models with this extended functionality. Then, finally, we'll write a few new tests to wrap everything up and make sure our code is doing what we expect it to. To do this, perform the following steps:

  1. Start by creating a new folder, services (at the same level as models).
  2. Require everything from this directory into our application by adding the service.rb file in the top level of our application. In this file, add the following line of code:

    Dir["./services/*rb"].each { |f| require f }

  3. Now require this file in application.rb at the top, underneath the line requiring the controller.rb file. It should now look like the following:

    # require all files in models and controllers directory

    require './model'

    require './controller'

    require './service'

  4. Create a folder called fixtures under the tests folder, and then download the votes.csv file from https://packt.live/2OzNN6a. It should contain the following data:

    category,votee,count

    VoteCategoryA,Chris Jones,23

    VoteCategoryA,Susie Bennet,29

    VoteCategoryB,Allan Green,33

    VoteCategoryB,Tony Bennet,23

  5. Create a test to check whether a category exists in the imported votes.csv file.
  6. Create the service class files, vote_importer.rb and vote_table.rb, in the services directory.
  7. In menu_controller.rb, add an option to import votes.
  8. Add an import_votes method to the voting_machine.rb model.
  9. Add a new import_controller.rb controller under the controllers folder.
  10. Update the leaderboard_controller.rb file to now log out tables instead of the raw objects.
  11. Create a votes.csv file with some test vote data that we can import in the root application directory.
  12. Run your solution.

    Here is the expected output:

    Figure 7.11: Employee of the month voting application

Figure 7.11: Employee of the month voting application

The Leaderboard output would look like:

Figure 7.12: Voting application dashboard

Figure 7.12: Voting application dashboard

Note

The solution to the activity can be found on page 476

Summary

In this chapter, we've covered how to import and export raw CSV data in our application, how to extend the functionality of our applications by including external libraries with Ruby gems, and how to interact with the filesystem using Ruby.

These are powerful tools that can turn our applications from toys into real services with just a few simple lines of code. With these new tools, we can import data from databases, spreadsheets, or any number of other sources and process them programmatically with Ruby in any way we want; the options are endless.

We've also learned some best practices regarding how to structure code that doesn't necessarily fit within our domain models by refactoring them into service objects, keeping our models lean and clean. Your coworkers will thank you for this one, trust me.

An application is only as good as its input and output. We've now learned a few methods by which we can increase how much we input by importing external data, and we've seen how to improve what we output by using external libraries to provide a more user-friendly representation of data.

We can safely say now that we're getting the hang of this. In the next chapter, we will dive a little deeper and go beyond the basics, extending our knowledge of Ruby by looking a little closer at some more advanced functionality.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.10.1