Chapter 3: Proper Variable Usage

Anytime you need to store information in a Ruby program and access it later, you will be using some sort of variable. Which types of variables you use has a significant effect on your program's performance and maintainability. In this chapter, you'll learn about Ruby's different variable types and the advantages of using and naming them properly.

We will cover the following topics:

  • Using Ruby's favorite variable type – the local variable
  • Learning how best to use instance variables
  • Understanding how constants are just a type of variable
  • Replacing class variables
  • Avoiding global variables, most of the time

By the end of this chapter, you'll have a better understanding of the different types of variables and how best to use them.

Technical requirements

In this chapter and all chapters of this book, code given in code blocks is designed to execute on Ruby 3.0. Many of the code examples will work on earlier versions of Ruby, but not all. The code for this chapter is available online at https://github.com/PacktPublishing/Polished-Ruby-Programming/tree/main/Chapter03.

Using Ruby's favorite variable type – the local variable

Ruby's favorite variable type is the local variable. Local variables are the only variable type that Ruby doesn't require you to use a sigil (for example, @ or $) or use capitalization. This is not by accident, this is by design, to nudge you toward using local variables as much as possible.

In this section, you'll learn how to improve performance by adding local variables, when it's safe to do so, issues involving scope gates, and the importance of local variable naming.

Increasing performance by adding local variables

You may be wondering, Why are local variables better than other types of variables? In Ruby, all other variable types require more indirection. Local variables require the least indirection. When you access a local variable in Ruby, the virtual machine knows the location of the local variable, and can more easily access the memory. Additionally, in most cases, the local variables are stored on the virtual machine stack, which is more likely to be in the CPU cache.

Let's say you want to have a TimeFilter class, such that you can pass an instance of it as a block when filtering:

time_filter = TimeFilter.new(Time.local(2020, 10),

                             Time.local(2020, 11))

array_of_times.filter!(&time_filter)

The purpose of the TimeFilter class is to filter enumerable objects such that only times between the first argument and the second argument are allowed through the filter. You also want to be able to leave out either of the ends, to only filter the times in one direction. One other desired usage of the TimeFilter class is to separate the times that are in the filter from times that are out of the filter, using Enumerable#partition:

after_now = TimeFilter.new(Time.now, nil)

in_future, in_past = array_of_times.partition(&after_now)

You could implement this as a method on Enumerable, but if you are writing a general-purpose library, you should not modify core classes unless that is the purpose of the library. Additionally, by writing a class that can be used as a block, you allow the class to be used by multiple methods since you could pass the block to filter! and partition as shown previously, but also to methods such as reject to remove times that are in the filter.

Here's one way you could implement this class. You need to have a to_proc method that returns proc, and inside the proc you check whether the value is after the start time and before the finish time. If there is a start time and it is before the start time, the proc returns false. As this is a proc and not a lambda, you use next to quickly return a value for the current block iteration. Likewise, if there is a finish time and it is after the finish time, it also returns false. Otherwise, it returns true:

class TimeFilter

  attr_reader :start, :finish

  def initialize(start, finish)

    @start = start

    @finish = finish

  end

  def to_proc

    proc do |value|

      next false if start && value < start

      next false if finish && value > finish

      true

    end

  end

end

One issue with this approach is that it is less efficient than it otherwise could be. The issue is with the implementation of to_proc. Every time the proc is called, it calls an attr_reader method to get the start time, and if there is a start time, it calls the attr_reader method again to get the start time to see whether the value is less than it. Likewise, every time the proc is called, it calls an attr_reader method to get the finish time, and if there is a finish time, it calls the attr_reader method again to get the finish time to see whether the value is greater than it.

That's four method calls during every block iteration, just to get the start and finish times. At least two of these calls are redundant. You can remove the redundancy by caching the result of the method call in a local variable:

def to_proc

  proc do |value|

    start = self.start

    finish = self.finish

    next false if start && value < start

    next false if finish && value > finish

    true

  end

end

By calling the start method on self and setting it to a local variable, and calling the finish method on self and setting it to a local variable, you've cut the number of attr_reader method calls in half. That doesn't quite double the performance of the proc, since there is definitely time spent in the greater than and less than method calls on value, and time spent evaluating the if conditionals, but this change could improve performance by 50% or so.

However, you could definitely improve performance further. One thing to notice here is that TimeFilter doesn't offer a way to modify the start or finish times. There isn't a reason to get the start and finish times in every call of the block, since the result will be the same every time. You can hoist the setting of the local variables before the proc. Code inside the proc can still access the local variables, since the proc operates as a closure, capturing local variables in the surrounding environment. With that change, your TimeFilter#to_proc method now looks like this:

def to_proc

  start = self.start

  finish = self.finish

  proc do |value|

    next false if start && value < start

    next false if finish && value > finish

    true

  end

end

You've now completely removed the attr_reader calls from the created proc, providing another large speedup. Now, the only method calls inside the proc are the greater than and less than method calls on value.

There is no reason to stop there, as you can improve the performance even more. Because you are retrieving the start and finish variables before creating the proc, you can use them to make the returned proc more efficient. There are actually four separate cases a TimeFilter instance could represent:

  • Both start and finish are used (the common case).
  • Only start is used, finish is nil.
  • Only finish is used, start is nil.
  • Both start and finish are nil (unlikely but possible).

You can produce optimal procs for each case. These procs can be even simpler than the previous case since you don't have to check whether start and finish are valid inside the proc. If both start and finish are used, the proc checks that value is greater than or equal to start, and less than or equal to finish. If just start is used, only the start value is checked. If just finish is used, only the finish value is checked. If neither is used, there is no filter, and the proc can always return true:

def to_proc

  start = self.start

  finish = self.finish

  if start && finish

    proc{|value| value >= start && value <= finish}

  elsif start

    proc{|value| value >= start}

  elsif finish

    proc{|value| value <= finish}

  else

    proc{|value| true}

  end

end

Using local variables in this way is one of the general principles of writing fast Ruby code. Anytime you have code that can be called multiple times, using a local variable at the highest possible level to cache the results of methods will speed the code up.

In the previous example, you used local variables to store the result of attr_reader method calls. However, local variables can be used to replace not just method calls, but also constants. For very performance-sensitive code that accesses constants, you can optimize it by storing constant references in local variables. For example, say you have a large array where you want to count the number of Array elements in it:

num_arrays = 0

large_array.each do |value|

  if value.is_a?(Array)

    num_arrays += 1

  end

end

Assuming that large_array is large and this code is very performance-sensitive, you can get a small speed boost by using a local variable for the Array reference:

num_arrays = 0

array_class = Array

large_array.each do |value|

  if value.is_a?(array_class)

    num_arrays += 1

  end

end

As a general rule, you should only use a local variable instead of a constant reference for code that is very performance-sensitive, as the minimal speed improvement is not worth the conceptual overhead in other cases.

Another consideration when using local variables to improve performance is to see whether you can further reduce the need for computation. For example, maybe you are writing a command-line program that will take a large array of floats, and remove values that are at least twice as large as the first argument given on the command line:

large_array.reject! do |value|

  value / 2.0 >= ARGV[0].to_f

end

Applying the principles that you've learned in this section, and realizing the command-line argument is unlikely to change after program execution, you change this to the following:

max = ARGV[0].to_f

large_array.reject! do |value|

  value / 2.0 >= max

end

This is certainly a large improvement, but you can further improve this by using the mathematical equivalent of multiplying 2 on both sides:

max = ARGV[0].to_f

large_array.reject! do |value|

  value >= max * 2

end

Then you can further improve performance by moving that calculation into the local variable:

max = ARGV[0].to_f * 2

large_array.reject! do |value|

  value >= max

end

In this section, you learned how to add local variables to improve the performance of your code. While this is a great idea most of the time, as you'll see in the next section, it is not always safe to do so.

Avoiding unsafe optimizations

One thing you need to remember when using local variables to optimize code is that you can only use this approach if the expression you are storing in the local variable is idempotent, meaning that it does not have side effects.

For example, consider the following code, where you are processing a large array in order to set values in a hash:

hash = some_value.to_hash

large_array.each do |value|

  hash[value] = true unless hash[:a]

end

In this case, it looks like you could use a local variable to improve performance:

hash = some_value.to_hash

a_value = hash[:a]

large_array.each do |value|

  hash[value] = true unless a_value

end

It may even be tempting to skip the array call entirely by checking whether hash[:a] has already been set:

hash = some_value.to_hash

unless a_value = hash[:a]

  large_array.each do |value|

    hash[value] = true

  end

end

Unfortunately, such an optimization is not safe in the general case. One issue is that large_array could contain :a as an element, and the purpose of the original code is to stop when :a is found. A less likely but still possible case that could have a problem is that the hash could have a default proc that sets or removes the :a entry from the hash. Before using this optimization safely, you would have to be sure that large_array cannot contain a :a element, and that the hash doesn't have a default proc that deals with the :a entry.

You should also avoid this approach when dealing with values that change over time, at least when you cannot ensure how long the values will last. For example, say you are removing times greater than the current time:

enumerable_of_times.reject! do |time|

  time > Time.now

end

Maybe it appears that you could use a local variable for the Time.now call:

now = Time.now

enumerable_of_times.reject! do |time|

  time > now

end

However, if enumerable_of_times only yields one time value per minute, it's probably a bad idea, since now will quickly deviate from Time.now.

You should be especially careful when using this approach if you are returning a proc containing a local variable reference from outside the scope of the proc. In any long-running program, it's almost always a bad idea to switch from the following:

greater_than_now = proc do |time|

  time > Time.now

end

To this:

now = Time.now

greater_than_now = proc do |time|

  time > now

end

It may not be a bad idea in a small command-line program that runs quickly, but if the program runs quickly, you probably don't need to optimize it.

Handling scope gate issues

Local variables in Ruby are in scope from the first time Ruby comes across them while parsing until the end of the scope they are defined in unless they hit a scope gate. In that case, they are not in scope inside the scope gate. In other words, the scope gate creates a new local variable scope. While you may not be familiar with the term scope gate, you already have a lot of experience with scope gates in Ruby, as the def, class, and module keywords all define scope gates.

The following scope gates show that at the start of each scope gate, there are no local variables:

defined?(a) # nil

a = 1

defined?(a) # 'local-variable'

module M

  defined?(a) # nil

  a = 2

  defined?(a) # 'local-variable'

  class C

    defined?(a) # nil

    a = 3

    defined?(a) # 'local-variable'

    def m

      defined?(a) # nil

      a = 4

      defined?(a) # 'local-variable'

    end

After the scope gate exits, the previous scope is restored and the value of the local variable, a, remains the same as before the scope gate was entered:

    a # 3

  end

  a # 2

end

a # 1

Additionally, calling a method defined with def in the same scope does not change the current local variables:

M::C.new.m

a # 1

All scope gates in Ruby have alternatives that do not add scope gates. The def keyword can be replaced with define_method, class with Class.new, and module with Module.new. All replacements accept a block, and blocks in Ruby are not scope gates, they are closures, which share the existing local variables of their surrounding scopes. Any local variables newly defined in a block are local to the block and blocks contained inside of it but are not available to the code outside of the block.

Replacing the scope gates in the previous example with their gateless equivalents, you end up with the following code:

defined?(a) # nil

a = 1

defined?(a) # 'local-variable'

M = Module.new do

  defined?(a) # 'local-variable'

  a = 2

  self::C = Class.new do

    defined?(a) # 'local-variable'

    a = 3

    define_method(:m) do

      defined?(a) # 'local-variable'

      a = 4

    end

Unlike the code that uses scope gates, after these blocks return, the value of a remains the same as before the blocks return since each block uses the same local variable. This code shows the danger of not using scope gates and reusing local variables. You can see that the class and module definitions override the local variables in the outer scope:

    a # 3

  end

  a # 3

end

a # 3

Even worse than that, calling the m method on the M::C instance overrides the local variable of the surrounding scope:

M::C.new.m

a # 4

This can result in hard-to-debug issues, especially in the case where define_method is used to define methods and where such methods are not called deterministically, such as when they are called based on user input.

The trade-off of using the gateless equivalents is that they can significantly improve performance. If a method is called often and performs a computation that can be cached, it can make sense to precompute the result and use define_method instead of using def. Let's say you are defining a method named multiplier that is based on a constant value and a command-line argument:

def multiplier

  Math::PI * ARGV[0].to_f

end

This always results in the same value, but Ruby will have to compute it separately every time the method is called. Using a gateless equivalent allows you to precompute the value:

multiplier = Math::PI * ARGV[0].to_f

define_method(:multiplier) do

  multiplier

end

Note that define_method has additional overhead compared to methods defined with def, so you should only use it in cases where you can avoid at least one method call inside the defined method.

Another use case for combining local variables with define_method is for information hiding. Let's say you want to define a method that is thread-safe, so it uses a mutex:

class T

  MUTEX = Mutex.new

  def safe

    MUTEX.synchronize do

      # non-thread-safe code

    end

  end

end

The problem with this code is users can easily poke around and use the constant directly:

T::MUTEX.synchronize{T.new.safe}

This results in thread deadlock. One way to discourage this behavior is to use a private constant:

class T

  MUTEX = Mutex.new

  private_constant :MUTEX

  def safe

    MUTEX.synchronize do

      # non-thread-safe code

    end

  end

end

This does make something slightly more difficult for the user, as accessing T::MUTEX directly will raise NameError. However, just as you can work around private methods with Kernel#send, you can work around private constants with Module#const_get:

T.const_get(:MUTEX).synchronize{T.new.safe}

In general, users that are accessing private constants deserve what they get, but if you want to make it even more difficult, you can use a local variable and define_method:

class T

  mutex = Mutex.new

  define_method(:safe) do

    mutex.synchronize do

      # non-thread-safe code

    end

  end

end

It is much more difficult for a user to get access to the local mutex variable that was defined in the T class than it is for them to access a private constant of the class.

Naming considerations with local variables

How you name your variables has a significant effect on how readable your code is. While Ruby allows a lot of flexibility when naming local variables, in general, you should stick to lower_snake_case all-ASCII names. Emoji local variable names are cute but lead to code that is difficult to maintain. For teams that are working in a single, non-English language, non-ASCII lower_snake_case names in the local language can be acceptable, but it will make it difficult for other Ruby programmers, so strong consideration should be given to whether non-native speakers of the language will ever be working on the code.

In terms of variable length, if you name all your local variables with a single character, it becomes almost impossible to keep track of what each variable actually represents. Likewise, if each of your local variables is_a_long_phrase_like_this, simply reading your code becomes exhausting. The central trade-off in variable naming is balancing understandability with ease of reading. Appropriately naming your variables can make it so your code isn't exhausting to read, but it is still easy to comprehend.

How do you decide what length of variable name is appropriate? The general principle in local variable naming is that the length of the variable name should be roughly proportional to the inverse of the size of the scope of the variable, with the maximum length being the length of the name that most accurately describes the variable.

For example, if you are calling a method that accepts a block, and the block is only a single line or a few lines, and the receiver of the method or the method name makes it obvious what block will be yielded, then it may make sense to use a single-letter variable:

@albums.each do |a|

  puts a.name

end

You could also use a numbered parameter in this case:

@albums.each do

  puts _1.name

end

Because album is a fairly small name, it would also be reasonable to use album as a local variable name:

@albums.each do |album|

  puts album.name

end

However, if the context doesn't make it obvious what is being yielded, then using a single variable name is a bad idea:

array.each do |a|

  puts a.name

end

Additionally, if the fully descriptive variable name is very long, it's a bad idea to use it for single-line blocks:

TransactionProcessingSystemReport.each do

  |transaction_processing_system_report|

    puts transaction_processing_system_report.name

  end

Using the full name in this case makes this code harder to read, and the clarity of the longer name adds no value. In cases like this, you may not want to use a single variable name, but you should probably at least abbreviate the name:

TransactionProcessingSystemReport.each do |tps_report|

  puts tps_report.name

end

Or even to this:

TransactionProcessingSystemReport.each do |report|

  puts report.name

end

If you have a 10-line method, it's probably not a good idea to use a single-letter variable throughout the method. Choose a more descriptive variable name. It doesn't have to be very long, and can certainly use abbreviations, but it should be descriptive enough that a programmer that is familiar with the code base can look at the method and not have any question about what the variable represents.

There are some common block variables for certain methods. Integer#times usually uses i, following the convention of for loops in C:

3.times do |i|

  type = AlbumType[i]

  puts type.name

  type.albums.each do |album|

    puts album.name

  end

  puts

end

While you could use a more descriptive name such as type_id, there is no significant advantage in doing so.

Likewise, when iterating over a hash, it is common to use k to represent the current key and v for the current value:

options.each do |k, v|

  puts "#{k}: #{v.length}"

end

However, you should be careful to only use this pattern in single, simple blocks. In blocks of more than three lines, and when nesting block usage, it's better to choose longer and more descriptive variable names. Let's look at this code:

options.each do |k, v|

  k.each do |k2|

    v.each do |v2|

      p [k2, v2]

    end

  end

end

You may be able to figure that the options hash has keys and values that are both enumerable, and this prints out each key/value pair separately, but it's not immediately obvious. More intuitive variable naming in this case would be something like this:

options.each do |key_list, value_list|

  key_list.each do |key|

    value_list.each do |value|

      p [key, value]

    end

  end

end

In any case where you are using a gateless equivalent to a scope gate, such as using define_method, be extra careful with your local variable naming, so that you don't accidentally overwrite a local variable unintentionally.

One case where it can be a good idea to use a single letter or otherwise short variable name in a longer scope is when there is a defined convention in the library you are using. For example, in the Roda web toolkit, there is a convention that the request object yielded to blocks is always named r, and documentation around request methods always shows calls such as r.path or r.get. The reason for this convention is the request object is accessed very often inside blocks, and a variable name such as request or even an abbreviation such as req would make the code more taxing to read. However, in the absence of a library convention for single-letter or otherwise short variable names, you should use more descriptive variable names for longer scopes.

In this section, you've learned about Ruby's favorite variable type, the local variable. You've learned how to use local variables for safe optimizations, the issues with using scope gates, and important principles in local variable naming. In the next section, you'll learn how best to use instance variables.

Learning how best to use instance variables

Almost all objects in Ruby support instance variables. As mentioned in Chapter 1, Getting the Most out of Core Classes, the exceptions are the immediate objects: true, false, nil, integer, floats, and symbols. The reason the immediate objects do not support instance variables is that they lack their own identity. Ruby is written in C, and internally to Ruby, all Ruby objects are stored using the VALUE type. VALUE usually operates as a pointer to another, larger location in memory (called the Ruby heap). In that larger location in memory is where instance variables are stored directly, or if that isn't large enough, a pointer to a separate location in memory where they are stored.

Immediate objects are different from all other objects in that they are not pointers, they contain all information about the object in a single location in memory that is the same size as a pointer. This means there is no space for them to contain instance variables.

Additionally, unlike most other objects, conceptually there are no separate instances of immediate objects, unlike other objects. Say you create two empty arrays like the following:

a = []

b = []

Then a and b are separate objects with their own identity. However, Say you create two nil objects:

a = nil

b = nil

There is no separate identity for the nil objects. All nil objects are the same as all other nil objects, so instance variables don't really make sense for nil (and other immediate objects), because there are no separate instances.

In this section, you'll learn how to increase performance by using instance variables, about issues with instance variable scope, and how best to name instance variables.

Increasing performance with instance variables

Just as with local variables, you can increase performance by adding instance variables. The same principles for optimizing with local variables, in general, apply to instance variables. Most times where you have a method that is likely to be called multiple times and where the method is idempotent, you can store the result of the calculation in an instance variable to increase performance.

Let's assume you have an Invoice class that accepts an array of LineItem instances. Each LineItem contains information about the item purchased, such as the price of the item and the quantity of items purchased. When preparing the invoice, the total tax needs to be calculated by multiplying the tax rate by the sum of the total cost of the line items:

LineItem = Struct.new(:name, :price, :quantity)

class Invoice

  def initialize(line_items, tax_rate)

    @line_items = line_items

    @tax_rate = tax_rate

  end

  def total_tax

    @tax_rate * @line_items.sum do |item|

      item.price * item.quantity

    end

  end

end

If total_tax is only called once in the average lifetime of the Invoice instance, then it doesn't make sense to cache the value of it, and caching the value of it can make things slower and require increased memory. However, if total_tax is often called multiple times in the lifetime of an Invoice instance, caching the value can significantly improve performance.

In the typical case, it's common to store the results of the calculation directly in an instance variable:

  def total_tax

    @total_tax ||= @tax_rate * @line_items.sum do |item|

      item.price * item.quantity

    end

  end

In this particular case, this approach should work fine. However, there are a couple cases where you cannot use this simple approach. First, this approach only works if the expression being calculated cannot result in a false or nil value. This is due to the ||= operator recalculating the expression if the @total_tax instance variable is false or nil. To handle this case, you should use an explicit defined? check for the instance variable:

  def total_tax

    return @total_tax if defined?(@total_tax)

    @total_tax = @tax_rate * @line_items.sum do |item|

      item.price * item.quantity

    end

  end

This will handle cases where the expression being cached can return nil or false. Note that it is possible to be more explicit and use instance_variable_defined?(:@total_tax) instead of defined?(@total_tax), but it is recommended that you use defined? because Ruby is better able to optimize it. This is because defined? is a keyword and instance_variable_defined? is a regular method, and the Ruby virtual machine optimizes the defined? keyword into a direct instance variable check.

The second case where you cannot use this check is when the Invoice instance is frozen. You cannot add instance variables to frozen objects. The solution in this case is to have an unfrozen instance variable hash inside the frozen object. Because the unfrozen hash can be modified, you can still cache values in it. You can modify the Invoice class to make sure all instances are frozen on initialization but contain an unfrozen instance variable named @cache, and that the total_tax method uses the @cache instance variable to cache values:

LineItem = Struct.new(:name, :price, :quantity)

class Invoice

  def initialize(line_items, tax_rate)

    @line_items = line_items

    @tax_rate = tax_rate

    @cache = {}

    freeze

  end

  def total_tax

    @cache[:total_tax] ||= @tax_rate *

      @line_items.sum do |item|

        item.price * item.quantity

      end

  end

end

Like the instance variable approach, the previous example also has issues if the expression can return false or nil. And you can fix those using a similar approach, with key? instead of defined?:

  def total_tax

    return @cache[:total_tax] if @cache.key?(:total_tax)

    @cache[:total_tax] = @tax_rate *

      @line_items.sum do |item|

        item.price * item.quantity

      end

  end

The other issue with this approach, and with caching in general using instance variables, is that, unlike local variables, you probably do not have control over the entire scope of the instance. When caching in local variables, you know exactly what scope you are dealing with, and can more easily determine whether using the local variable as a cache is safe. If any of the objects in the expression being cached are mutable, there is a chance that the cached value could become inaccurate, as one of the objects in the expression could be changed.

In the previous example, the Invoice class does not offer an accessor for line_items or tax_rate. Since it is frozen, you can assume tax_rate cannot be changed, since it is probably stored as a numeric value, and those are frozen by default, even if they are not immediate objects. However, consider line_items. While Invoice does not offer an accessor for it, the values passed in could be modified after they are passed in and the total tax has been calculated:

line_items = [LineItem.new('Foo', 3.5r, 10)]

invoice = Invoice.new(line_items, 0.095r)

tax_was = invoice.total_tax

line_items << LineItem.new('Bar', 4.2r, 10)

tax_is = invoice.total_tax

With this example, tax_was and tax_is will be the same value, even though the Invoice instances line items have changed. To avoid this issue, there are a couple of approaches. The first approach is that Invoice could duplicate the line items, so that changes to the line items used as an argument do not affect the invoice:

def initialize(line_items, tax_rate)

  @line_items = line_items.dup

  @tax_rate = tax_rate

  @cache = {}

  freeze

end

This still allows someone to use instance_variable_get(:@line_items) to get the array of line items and modify it.

The second approach is freezing the line items:

def initialize(line_items, tax_rate)

  @line_items = line_items.freeze

  @tax_rate = tax_rate

  @cache = {}

  freeze

end

This is a better approach, except that it mutates the argument, and in general it is a bad idea for any method to mutate arguments that it doesn't control unless that is the sole purpose of the method. The safest approach is the combination of approaches:

def initialize(line_items, tax_rate)

  @line_items = line_items.dup.freeze

  @tax_rate = tax_rate

  @cache = {}

  freeze

end

This makes sure that the array of line items cannot be modified. However, there is still a way for the resulting calculation to go stale, and that is if one of the line items is modified directly:

line_items = [LineItem.new('Foo', 3.5r, 10)]

invoice = Invoice.new(line_items, 0.095r)

tax_was = invoice.total_tax

line_items.first.quantity = 100

tax_is = invoice.total_tax

Here you are modifying the quantity in the first line item, which should result in a change to the total tax. To avoid this issue, you need to make sure you can freeze the line items. One approach is to make all LineItem instances frozen:

LineItem = Struct.new(:name, :price, :quantity) do

  def initialize(...)

    super

    freeze

  end

end

However, if you don't want to take that approach, and only want to freeze line items given on the invoice, in the Invoice#initialize method, you can map over the list of line items, return a frozen dump of each item, and then freeze the resulting array:

def initialize(line_items, tax_rate)

  @line_items = line_items.map do |item|

    item.dup.freeze

  end.freeze

  @tax_rate = tax_rate

  @cache = {}

  freeze

end

You've now learned that in order to get the maximum benefit of caching inside objects, you need to be dealing with frozen objects, but where each frozen object has an unfrozen cache.

Handling scope issues with instance variables

Like local variables, instance variables have their own scopes, but unlike local variables, the scope of instance variables is not lexical. The scope of instance variables is always the same as the implicit receiver of methods, self. The scope gates that were discussed in Handling scope gate issues, def, class, and module, also change instance variable scope. However, the gateless equivalents of define_method, Class.new, and Module.new also change instance variable scope, since they have a new self.

One of the main issues to be concerned with when using instance variables is using them inside blocks passed to methods you do not control. Let's assume you were using the Invoice class from the previous section, but you want to add a method named line_item_taxes that returns an array of taxes, one for each line item. One way to implement this would be a map over the line items, with the total price of the line item multiplied by the tax rate of the invoice:

class Invoice

  def line_item_taxes

    @line_items.map do |item|

      @tax_rate * item.price * item.quantity

    end

  end

end

This would work in most cases, but there is a case where it would fail. In this example, you are assuming that @line_items is an array of LineItem instances. However, that doesn't necessarily have to be the case. Instead of a simple array, the passed-in line_items argument could be an instance of a separate class:

class LineItemList < Array

  def initialize(*line_items)

    super(line_items.map do |name, price, quantity|

      LineItem.new(name, price, quantity)

    end)

  end

  def map(&block)

    super do |item|

      item.instance_eval(&block)

    end

  end

end

Invoice.new(LineItemList.new(['Foo', 3.5r, 10]), 0.095r)

One reason to implement such a class is to make it easier to construct a literal list of line items, by just providing arrays of name, price, and quantity to the LineItemList initializer, and having it automatically create the LineItem instances. To make things even easier for the user, the LineItemList class has a map method that evaluates the block passed to it in the context of the item, in addition to passing the item as a variable to the block. This allows for simpler code inside the block, as long as you are only accessing local variables and methods of the current line item. For example, you can generate an array of total costs for each line item more easily:

line_item_list.map do

  price * quantity

end

Instead of the following more verbose code:

line_item_list.map do |item|

  item.price * item.quantity

end

The trade-off in this case is that doing this changes the scope of the block from the caller's scope to the scope of the line item. This breaks the example used earlier, because the @tax_rate reference is no longer the tax rate of the invoice, but the tax rate of the line item. As LineItem doesn't have a @tax_rate instance variable, the instance variable access returns nil, and this likely results in NoMethodError:

class Invoice

  def line_item_taxes

    @line_items.map do |item|

      @tax_rate * item.price * item.quantity

    end

  end

end

You can work around this case by assigning the instance variable to a local variable before the block and accessing the local variable inside the block. As explained in Increasing performance by adding local variables, that's probably a good idea anyway, as it is likely to improve the overall performance. This is because accessing local variables is faster than accessing instance variables. Let's switch the example to store the instance variable in a local variable for better performance:

class Invoice

  def line_item_taxes

    tax_rate = @tax_rate

    @line_items.map do |item|

      tax_rate * item.price * item.quantity

    end

  end

end

Issues like this are one reason why it's generally a bad idea for code to use methods such as instance_eval and instance_exec without a good reason. Using instance_eval or instance_exec on blocks that are likely to be called inside user code, as opposed to blocks used for configuration, can be a common source of bugs. In this particular case, the issue shows up with instance variable use, but it also occurs any place methods of the surrounding scope are called implicitly, or when self is used directly.

Naming considerations for instance variables

Like local variables, instance variables should be named with @lower_snake_case with all-ASCII characters. One exception to this is when using instance variables with anonymous classes and modules (generally when testing), in which @ClassName and @ModuleName are also acceptable. Like local variables, avoid emoji in instance variable names, and only use non-ASCII characters with localized names when the code is being maintained solely in that language.

Since instance variable scope is not lexical, you never know how long the scope will be, and therefore you should avoid single-letter or other very short instance variable names. However, because instance variables are internal to the object and easy to refactor later, you generally should not need to use long descriptive instance variable names.

Using the TransactionProcessingSystemReport example from Naming considerations with local variables, if you have to store a TransactionProcessingSystemReport instance in an instance variable, the fully descriptive name is probably too long:

@transaction_processing_system_report =

  TransactionProcessingSystemReport.new

You should probably use an abbreviated name:

@tps_report = TransactionProcessingSystemReport.new

Or even simpler if the object only deals with a single type of report:

@report = TransactionProcessingSystemReport.new

In this section, you learned how to use instance variables to improve performance, about issues with instance variable scope, and important principles in instance variable naming. In the next section, you'll learn that Ruby's constants are really variables in disguise.

Understanding how constants are just a type of variable

Ruby has constants, but unlike constants in most other languages, Ruby's constants are actually variables. It's not even an error in Ruby to reassign a constant; it only generates a warning. Say you try the following code:

A = 1

A = 2

Then you'll see it only generates two warnings:

# warning: already initialized constant A

# warning: previous definition of A was here

At best, Ruby's constants should be considered only as a recommendation. That being said, not modifying a constant is a good recommendation. In general, you shouldn't modify constants unless you have to, especially constants that are in external code such as libraries.

You can think of a constant in Ruby as a variable type that can only be used by modules and classes, with different scope rules. As both modules and classes are objects, they can both have instance variables in addition to constants. When a class or module needs to store information, you should consider whether an instance variable or a constant is more appropriate.

Handling scope issues with constants

Constant scope in Ruby is different than both local variable scope or instance variable scope. In some ways, it is lexical, but it's not truly lexical as the constant doesn't have to be declared in the same lexical scope in which it is accessed. Constant scope and resolution is one of the more involved parts of Ruby, and even many experienced Ruby programmers probably forget how it works in detail.

It's easiest to learn Ruby's constant scope rules by examples. You can start by defining class A, with constants W, X, Y, and Z. You can also define constants U and Y in Object, as it will be easier to learn constant resolution with them. As A does not specify a subclass, Ruby will make it a subclass of Object:

class A

  W = 0

  X = 1

  Y = 2

  Z = 3

end

class Object

  U = -1

  Y = -2

end

You can make a subclass of A named B, and define constants X and Z inside B:

class B < A

  X = 4

  Z = 5

end

If you open up the B class in a separate scope, you can check the value of each of U, W, X, Y, and Z to see how constant resolution works:

class B

  U # -1, from Object

  W # 0, from A

  X # 4, from B

  Y # 2, from A

  Z # 5, from B

end

We see X and Z use the value directly defined in B, while W and Y use the value from A (the superclass of B), and U uses the value from Object (the superclass of the superclass of B). From this example, you know that the class lookup will look first at the class or module for the constant, and only at superclasses of the class or module if the constant isn't found in the class directly, and if the superclass doesn't contain the constant, continue recursively up the ancestor chain.

For a single-class definition, that's all you need to worry about in regards to constant resolution. However, the situation gets significantly more complex when you have a class or module definition inside another class or module definition. To illustrate this, you can define another subclass of A named C that just defines a constant, Y:

class C < A

  Y = 6

end

You can also define a class, D, that defines a constant, Z:

class D

  Z = 7

end

And then a subclass of D named E that defines a constant, W:

class E < D

  W = 8

end

To further understand constant resolution, we will look at two different possible ways to nest constants. The first one is where class C is nested under class E. You need to use class ::C in this case so that you reopen the top-level C class and do not create an E::C class:

class E

  class ::C

    U # -1, from Object

    W # 8, from E

    X # 1, from A

    Y # 6, from C

    Z # 3, from A

  end

end

From these results, you can see that E takes priority over A (the superclass of C) because both E and A define the constant W, but the constant resolution of W inside C will find the constant in E before it finds the constant in the superclass of C. However, for the constant Z, it is defined in both D (the superclass of E) and A (the superclass of C), but the value used is from A and not D.

If you switch the nesting, you get different results:

class C

  class ::E

    U # -1, from Object

    W # 8 from E

    X # NameError

    Y # 6, from C

    Z # 7, from D

  end

end

Here, you get NameError for X but not for Z. X is defined in A, which is the superclass of C, while Z is defined in D, the superclass of E.

Just to make sure you get a more complete understanding, let's nest both C and E under B:

class B

  class ::C

    class ::E

      U # -1, from Object

      W # 8 from E

      X # 4, from B

      Y # 6, from C

      Z # 5, from B

    end

  end

end

Here you can see that X and Z now resolve to the constants in B. Because Z is defined in both D and B, you can see that the lexical nesting in B takes precedence over the superclass resolution in E.

From this example, you can probably guess Ruby's constant lookup algorithm:

  1. Look in the current namespace (W in the previous example).
  2. Look in the lexical namespaces containing the current namespace (X, Y, and Z in the previous example).
  3. Look in the ancestors of the current namespace, in order (U in the previous example).
  4. Do not look in ancestors of the lexical namespaces containing the current namespace.

Stated in four brief rules, the algorithm is not difficult to understand, but constant scope is still much trickier than class instance variable scope, which is always the same no matter the nesting:

class C

  @a # instance variable of C

end

class B

  class ::C

    @a # same instance variable of C

  end

end

In this section, you've learned that constant scope in Ruby may not be intuitive, but it can be understood by remembering four simple rules. You also saw how constants and class instance variables differ in terms of scope. In the next section, you'll learn how constants and class instance variables differ in terms of visibility.

Visibility differences between constants and class instance variables

One significant difference between constants and class instance variables is that constants are externally accessible by default, whereas class instance variables are like all instance variables and not externally accessible by default. You can make constants not externally accessible using private_constant:

class A

  X = 2

  private_constant :X

end

A::X

# NameError

However, this error occurs only when getting the value of the constant; you can still set the value of the constant with only a warning:

A::X = 3

# warning: already initialized constant A::X

Note that reassigning the constant does not change the external visibility; you still get a NameError if trying to externally access the constant after reassigning the value:

A::X

# NameError

You have to explicitly set the constant as public using public_constant to make it externally accessible again:

class A

  public_constant :X

end

A::X # 3

For class instance variables, you can make them externally accessible similar to how you make instance variables accessible for regular objects, by calling attr_reader or attr_accessor. When making instance variables accessible for other objects, you generally make them accessible for all instances of the same class, so you define attr_reader or attr_accessor on the class itself.

However, you don't want to define accessors for class instance variables for all classes (all instances of the Class class); you only want to define accessors for instance variables for a specific instance of Class. In this case, you would do the same thing for a class as you would if you wanted to define accessors for only a specific instance of the class. You would define the methods on the singleton class of the object:

class A

  @a = 1

  class << self

    attr_reader :a

  end

end

A.a # 1

In this example, attr_reader is called on the singleton class of A, which makes the A.a method return the value of the @a class instance variable of A.

You'll learn about more differences between constants and class instance variables later in this chapter, where you'll learn about replacements for class variables.

Naming considerations with constants

The naming of constants depends on whether they are classes/modules or other objects. Classes and modules should use CamelCase. Other objects should use ALLCAPS_SNAKE_CASE. Ruby follows these conventions internally. You have class names such as ArgumentError and BasicObject, and other constant names such as TOPLEVEL_BINDING and RUBY_ENGINE.

Like local and instance variables, it's best to keep this to all-ASCII names. Avoid emoji in constant names, and only use non-ASCII characters with localized names when the code is being maintained solely in that language.

In general, it's best to keep class and module names long and descriptive. In cases where the entire class name becomes tedious to use, the class can be stored with a shorter name in a local variable, instance variable, or other constant.

Similar to local variable names, one case where constant names can be short is when there is a defined convention in the library being used for short constant names. For example, in the Sequel database library, the convention is to store the Sequel::Database instance in a constant named DB, since there is usually only one instance initialized in each application. All of the documentation for the library uses this convention, and users are strongly encouraged to follow it. However, in the absence of a library or application convention for short constant names for specific constants that are used constantly in the application, constant names should be long and descriptive.

In this section, you learned how constants are just a type of variable, how constant scope works, how constants differ from class instance variables in terms of scope, and important principles when naming constants. In the next section, you'll learn about Ruby's class variables, and what you should use instead.

Replacing class variables

There are a few features in Ruby you should never use, and class variables are one of them. Class variable semantics are bad enough that the Ruby core team now recommends against their use, and no longer considers it worth it to even fix bugs in how class variables are handled. This is a shame because class variables almost have behavior you want. However, class variable behavior is just different enough from what you want to not be useful.

At first appearance, class variables have desirable qualities:

  • You can access them in the class itself.
  • You can access them when reopening the singleton class in the class itself.
  • You can access them in the class's methods.
  • You can access them in all of these places in any of the class's subclasses.

Here's an example:

class A

  @@a = 1

  class << self

    @@a

  end

  def b

    @@a

  end

end

class B < A

  @@a

end

So far, so good. However, what happens when you change the value of the class variable in B ?

class B

  @@a = 2

end

A.new.b # 2

Changing the class variable in B doesn't affect just the class variable in B as you might expect, it changes the class variable in A as well. This is because class variables aren't really specific to a class but to a class hierarchy. Therefore, you can never safely define a class variable in any class that is subclassed or any module that is included in other classes, ruling out their safe usage completely in libraries.

That's weird and bad, but it gets worse. Let's say you have a class variable in B:

class B

  @@b = 3

  

  def c

    @@b

  end

end

B.new.c # 3

Okay, that works as expected. What happens if, later, you try to access the class variable from A, the superclass of B?

class A

  @@b # NameError

end

You get NameError. That's good, because you never defined the class variable in A, and surely you don't want the class variable to propagate up to superclasses?

What happens if, later, you define a class variable with the same name in A?

class A

  @@b = 4

end

Ruby doesn't complain about this; it doesn't even issue a warning. However, what if you later call that B#c method?

B.new.c

# RuntimeError (class variable @@b of B is overtaken by A)

You get RuntimeError. RuntimeError is raised when the class variable is accessed, instead of when the class variable was overridden in the superclass. This RuntimeError may not occur when your application is loaded, only later when the method is called.

This means it is never safe to define a class variable in a subclass because if the same class variable name is added to a superclass, it will break the subclass.

Since you can't safely use a class variable in a subclass, and can't safely use a class variable in a superclass or module, there really isn't any way to use them safely. That plus the fact that modifying a class variable in a subclass changes the value of the class variable in the superclass means that there is no reason to use them.

There are at least three reasonable separate approaches for replacing class variables in Ruby, which you'll learn about in the following sections.

Replacing class variables with constants

One possible approach to replacing class variables is using constants instead. Constants have a nice property that they already operate more or less sanely in a class hierarchy:

class A

  C = 1

end

class B < A

  C # 1

end

Accessing a constant will use the constant defined in the superclass, as you saw in Handling scope issues with constants earlier in this chapter. What happens when you set the constant in the subclass?

class B

  C = 2

end

class B

  C # 2

end

class A

  C # 1

end

It only sets the constant value in the subclass; it does not propagate the change to the superclass. That's much better than class variable behavior.

What's the downside of using constants as a replacement for class variables? The main downside is that, as you learned in Understanding how constants are just a type of variable, Ruby warns you when you change the value of a constant:

class B

  C += 1 # warning

end

Also, while you can access a constant inside a method, you can't set a constant inside a method, at least not using the standard constant setting syntax:

class B

  def increment

    # would be SyntaxError, dynamic constant assignment

    # C += 1

  end

end

You have to use Module#const_set:

class B

  def increment

    self.class.const_set(:C, C + 1)

  end

end

This is still a poor approach as it warns on every call to the method.

Because a constant can refer to a mutable object, it is possible to allow reassignment behavior without actually reassigning the constant itself:

class B

  C = [0]

  def increment

    C[0] += 1

  end

end

Using a mutable constant to work around constant reassignment warnings is definitely a hack and not an implementation recommendation. It's a bad idea to use this approach, for the same reason it is bad to rely on globally mutable data structures in general.

For class variables that do not need to be modified, using a constant instead should work fine. However, in any case where you will be reassigning the value, it is a bad idea to use a constant, and you should use one of the next two approaches instead.

Replacing class variables with class instance variables using the superclass lookup approach

If you cannot replace your class variable with a constant because you are reassigning it, you should replace it with a class instance variable. However, like all instance variables, class instance variables are specific to the class itself and are not automatically propagated to subclasses. One approach to work around this fact is to look in the superclass if you don't find the instance variable in the current class, called the superclass lookup approach.

To implement this approach, let's continue with our example with class A and subclass B, but this time class A has an instance variable @c with a value of 1:

class A

  @c = 1

end

class B < A

end

Let's say you want to get the value of @c from B using the superclass lookup approach. This involves either a recursive or iterative approach to look in superclasses. Here's how you could code the iterative approach:

class B

  if defined?(@c)

    c = @c

  else

    klass = self

    while klass = klass.superclass

      if klass.instance_variable_defined?(:@c)

        c = klass.instance_variable_get(:@c)

        break

      end

    end

  end

end

If B already defines the instance variable, you just use the defined value. Otherwise, you look in the superclass of B and see whether it defines the instance variable, and if it is defined, then you use it, otherwise, you try the next superclass.

As you can see, this is a lot of code for every time you want to access the instance variable, so almost always this would be wrapped in a class method of the superclass so that it works for all subclasses:

def A.c

  if defined?(@c)

    @c

  else

    klass = self

    while klass = klass.superclass

      if klass.instance_variable_defined?(:@c)

        return klass.instance_variable_get(:@c)

      end

    end

  end

end

A.c # 1

B.c # 1

It's still simple to set an explicit class instance variable value inside class B, and the iterative approach will pick it up:

class B

  @c = 2

end

A.c # 1

B.c # 2

The recursive approach is similar to the iterative approach, it just uses recursion instead of iteration in the lookup method. This is actually a much simpler approach in terms of code, and it performs better as well, due to fewer and simpler method calls:

def A.c

  defined?(@c) ? @c : superclass.c

end

One advantage of the superclass lookup approach is that if you change the class instance variable value in the superclass without changing it in the subclass, calling the lookup method in the subclass will reflect the changed value in the superclass. Another advantage is that the superclass approach uses minimal memory. The disadvantage is the variable lookup can take significantly more time, at least for deep hierarchies, especially if it is unlikely you'll be changing the value in subclasses. This is a classic processing time versus memory trade-off. The superclass lookup approach makes the most sense if reduced memory is more important than processing time.

Replacing class variables with class instance variables using the copy to subclass approach

The alternative to the superclass lookup approach when replacing class variables with class instance variables is copying each instance variable into the subclass when the subclass is created. This approach requires that you set up the support for it before creating subclasses.

In order to modify each subclass as soon as it is created, you use the inherited singleton method of the superclass. This method is called with each subclass created and can be used to modify the created subclass. In your inherited method, for each of the class instance variables you want to copy into the subclass, you call instance_variable_set on the subclass:

class A

  @c = 1

  def self.inherited(subclass)

    subclass.instance_variable_set(:@c, @c)

  end

end

class B < A

  @c # 1

end

This approach has the advantage that you can access the instance variables directly in subclasses without having to use a special method. This makes accessing the values in the subclass faster. The disadvantage is that if you change the value of the variable in A without having modified the value in B, looking up the value in B will reflect the initial value that was set when B was created, instead of the current value in A. Additionally, the subclass copy approach requires more memory, especially if you have a large number of instance variables you need to copy into the subclass and/or a large number of subclasses.

In this section, you learned that you should never use class variables and three approaches to replacing them. In the next section, you'll learn about Ruby's final variable type, the global variable.

Avoiding global variables, most of the time

Global variables are available in Ruby, but in general, their use is discouraged unless it is necessary. Some examples where it may make sense for you to use global variables are when you are modifying the load path:

$LOAD_PATH.unshift("../lib")

Or when you are silencing warnings in a block (assuming you actually have a good reason to do that):

def no_warnings

  verbose = $VERBOSE

  $VERBOSE = nil

  yield

ensure

  $VERBOSE = verbose

end

Or lastly, when reading/writing to the standard input, output, or error:

$stdout.write($stdin.read) rescue $stderr.puts($!.to_s)

These are all cases where you are using the existing global variables. It rarely makes sense to define and use your own global variables, even though Ruby does make it easy to use global variables since they are global and available everywhere.

The main issues with using global variables in Ruby are the same as using global variables in any programming language, in that it encourages poor coding style and hard-to-follow code. Additionally, because there is only one shared namespace for global variables, there is a greater chance of variable conflicts. Let's say you have code like the following:

class SomeObject

  def current_user

    $current_user

  end

end

And somewhere else in your application is the following:

$current_user = User[user_id]

It's probably going to be a pain to use parts of your application in a script that doesn't set $current_user. Global variables make this type of setup easy, but in general, this is a Faustian bargain, as you are trading to get convenient localized access in exchange for long-term architectural problems. This approach almost always results in significant technical debt as soon as it is committed.

As you'll learn, it's fairly easy to replace global variables, but using an approach that avoids global variables while keeping the same architecture does not fix anything. If you need information in a low-level part of your application that comes from a high-level part of your application, do not take the shortcut of using a global variable or any similar approach. Properly pass the data as method arguments all the way from the high level to the low level. Otherwise, you are just setting yourself up for long-term problems.

That being said, there are cases where you need a global value or some global state. For example, if you are writing a batch processing system for the invoices discussed earlier in the chapter and you want to print a period for every 100 invoices processed as a minimal form of progress indicator, you could use a global variable as a quick way to implement it. You could initialize your global variable at the start of your program:

$invoices_processed = 0

And then every time you process an invoice:

$invoices_processed += 1

if $invoices_processed % 100 == 0

  print '.'

end

To avoid the use of a global variable, it's possible to switch to a constant object with some useful helper methods:

INVOICES_PROCESSED = Object.new

INVOICES_PROCESSED.instance_eval do

  @processed = 0

  def processed

    @processed += 1

    if @processed % 100 == 0

      print '.'

    end

  end

end

And then when you process an invoice, you can use simpler code:

INVOICES_PROCESSED.processed

If you don't want to use a single constant with specialized behavior, you can also just add an accessor to an existing singleton, such as the Invoice class:

class Invoice

  @number_processed = 0

  singleton_class.send(:attr_accessor, :number_processed)

end

And then your invoice processing code can use similar code as was used for the global variable:

Invoice.number_processed += 1

if Invoice.number_processed % 100 == 0

  print '.'

end

About the only time to use a global variable instead of a singleton accessor method or a specialized constant is when you need the absolute maximum performance, as global variable getting and setting is faster than calling a method. In all other cases, defining your own global variables should be avoided.

Summary

In this chapter, you learned all about Ruby's different variable types. You learned how to use local variables whenever possible. You also learned how both local variables and instance variables can provide substantial performance benefits with intelligent caching.

Moving on, we covered that constants are just another type of variable and that both constants and class instance variables can replace the use of class variables. Finally, you learned about global variables and how to replace their usage with constants or accessor methods on singletons.

Most importantly, in this chapter, you learned when it is appropriate to use each of Ruby's variable types, and how to properly name them, which are two of the most important factors in writing Ruby programs that are easy to maintain.

In the next chapter, you'll build on this knowledge, and learn about methods and how best to use their many types of arguments.

Questions

  1. Is it always a good idea to use long descriptive names for local variables?
  2. When using instance variables for caching, why is it important that the object be frozen?
  3. A constant named SomeValue probably contains an instance of what type of Ruby class?
  4. When should you use class variables?
  5. Should you always avoid using global variables?

Further reading

Numbered parameters: https://docs.ruby-lang.org/en/3.0.0/Proc.html#class-Proc-label-Numbered+parameters

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.150.163