Ever sent an email to the wrong contact? You probably had a hard time sorting out the confusion that ensued. Well, Ruby objects are just like those contacts in your address book, and calling methods on them is like sending messages to them. If your address book gets mixed up, it’s possible to send messages to the wrong object. This chapter will help you recognize the signs that this is happening, and help you get your programs running smoothly again.
The word continues to spread—if someone has a Ruby problem, your company can solve it. And so people are showing up at your door with some unusual dilemmas...
This astronomer thinks he has a clever way to save some coding. Instead of typing my_star = CelestialBody.new
and my_star.type = 'star'
for every star he wants to create, he wants to just copy the original star and set a new name for it.
But the plan seems to be backfiring. All three of his CelestialBody
instances are reporting that they have the same name!
The bug in the star catalog program stems from an underlying problem: the developer thinks he’s working with multiple objects, when actually he’s operating on the same object over and over.
To understand how that can be, we’re going to need to learn about where objects really live, and how your programs communicate with them.
Rubyists often talk about “placing objects in variables,” “storing objects in arrays,” “storing an object in a hash value,” and so forth. But that’s just a simplification of what actually happens. Because you can’t actually put an object in a variable, array, or hash.
Instead, all Ruby objects live on the heap, an area of your computer’s memory allocated for object storage.
When a new object is created, Ruby allocates space on the heap where it can live.
Generally, you don’t need to concern yourself with the heap—Ruby manages it for you. The heap grows in size if more space is needed. Objects that are no longer used get cleared off the heap. It’s not something you usually have to worry about.
But we do need to be able to retrieve items that are stored on the heap. And we do that with references. Read on to learn more about them.
When you want to send a letter to a particular person, how do you get it to them? Each residence in a city has an address that mail can be sent to. You simply write the address on an envelope. A postal worker then uses that address to find the residence and deliver the letter.
When a friend of yours moves into a new residence, they give you their address, which you then write down in an address book or other convenient place. This allows you to communicate with them in the future.
Ruby uses references to locate objects on the heap, like you might use an address to locate a house. When a new object is created, it returns a reference to itself. You store that reference in a variable, array, or other convenient place. Similar to a house address, the reference tells Ruby where the object “lives” on the heap.
Later, you can use that reference to call methods on the object (which, you might recall, is similar to sending them a message).
We want to stress this: variables, arrays, hashes, and so on never hold objects. They hold references to objects. Objects live on the heap, and they are accessed through the references held in variables.
Andy met not one, but two, gorgeous women last week: Betty and Candace. Better yet, they both live on his street.
Andy intended to write down both their addresses in his address book. Unfortunately for him, he accidentally wrote down the same address (Betty’s) for both women.
Later that week, Betty received two letters from Andy:
Now, Betty is angry at Andy, and Candace (who never received a letter) thinks Andy is ignoring her.
What does any of this have to do with fixing our Ruby programs? You’re about to find out...
Andy’s dilemma can be simulated in Ruby with this simple class, called LoveInterest
. A LoveInterest
has an instance method, request_date
, which will print an affirmative response just once. If the method is called again after that, the LoveInterest
will report that it’s busy.
Normally, when using this class, we would create two separate objects and store references to them in two separate variables:
betty = LoveInterest.new candace = LoveInterest.new
When we use the two separate references to call request_date
on the two separate objects, we get two affirmative answers, as we expect.
We can confirm that we’re working with two different objects by using the object_id
instance method, which almost all Ruby objects have. It returns a unique identifier for each object.
But if we copy the reference instead, we wind up with two references to the same object, under two different names (the variables betty
and candace
).
This sort of thing is known as aliasing, because you have multiple names for a single thing. This can be dangerous if you’re not expecting it!
In this case, the calls to request_date
both go to the same object. The first time, it responds that it’s available, but the second request is rejected.
This aliasing behavior seems awfully familiar... Remember the malfunctioning star catalog program? Let’s go back and take another look at that next.
Now that we’ve learned about aliasing, let’s take another look at the astronomer’s malfunctioning star catalog, and see if we can figure out the problem this time...
If we try calling object_id
on the objects in the three variables, we’ll see that all three variables refer to the same object. The same object under three different names...sounds like another case of aliasing!
By copying the contents of the variables, the astronomer did not get three distinct CelestialBody
instances as he thought. Instead, he’s a victim of unintentional aliasing — he got one CelestialBody
with three references to it!
To this poor, bewildered object, the sequence of instructions looked like this:
“Set your name
attribute to 'Altair'
, and your type
attribute is now 'star'
.”
“Now set your name to 'Polaris'
.”
“Now your name is 'Vega'
.”
“Give us your name attribute 3 times.”
The CelestialBody
dutifully complied, and told us three times that its name
was now Vega
.
Fortunately, a fix will be easy. We just need to skip the shortcuts and actually create three CelestialBody
instances.
And as we can see from the output, the problem is fixed!
It’s definitely good policy to avoid copying references from variable to variable. But there are other circumstances where you need to be aware of how aliasing works, as we’ll see shortly.
Before we move on, we should mention a shortcut for identifying objects. We’ve already shown you how to use the object_id
instance method. If it outputs the same value for the object in two variables, you know they both point to the same object.
The string returned by the inspect
instance method also includes a representation of the object ID, in hexadecimal (consisting of the numbers 0 through 9 and the letters a through f). You don’t need to know the details of how hexadecimal works; just know that if you see the same value for the object referenced by two variables, you have two aliases for the same object. A different value means a different object.
The astronomer is back, with more problematic code...
He needs his hash to be a mix of planets and moons. Since most of his objects will be planets, he set the hash default object to a CelestialBody
with a type
attribute of "planet"
. (We saw hash default objects last chapter; they let you set an object the hash will return any time you access a key that hasn’t been assigned to.)
He believes that will let him add planets to the hash simply by assigning names to them. And it seems to work:
When the astronomer needs to add a moon to the hash, he can do that, too. He just has to set the type
attribute in addition to the name
.
But then, as he continues adding new CelestialBody
objects to the hash, it starts behaving strangely...
The problems with using a CelestialBody
as a hash default object become apparent as the astronomer tries to add more objects to the hash. When he adds another planet after adding a moon, the planet’s type
attribute is set to "moon"
as well!
If he goes back and gets the value for the keys he added previously, those objects appear to have been modified as well!
Good observation! Remember we said that the inspect
method string includes a representation of the object ID? And as you know, the p
method calls inspect
on each object before printing it. Using the p
method shows us that all the hash keys refer to the same object!
Looks like we’ve got a problem with aliasing again! On the next few pages, we’ll see how to fix it.
The central problem with this code is that we’re not actually modifying hash values. Instead, we’re modifying the hash default object.
We can confirm this using the default
instance method, which is available on all hashes. It lets us look at the default object after we create the hash.
Let’s inspect the default object both before and after we attempt to add a planet to the hash.
So why is a name being added to the default object? Shouldn’t it be getting added to the hash value for bodies['Mars']
?
If we look at the object IDs for both bodies['Mars']
and the hash default object, we’ll have our answer: p bodies['Mars'] p bodies.default
When we access bodies['Mars']
, we’re still getting a reference to the hash default object! But why?
When we introduced the hash default object in the last chapter, we said that you get the default object any time you access a key that hasn’t been assigned to yet. Let’s take a closer look at that last detail.
Let’s suppose we’ve created a hash that will hold student names as the keys, and their grades as the corresponding values. We want the default to be a grade of 'A'
. grades = Hash.new('A')
At first, the hash is completely empty. Any student name that we request a grade for will come back with the hash default object, 'A'
.
When we assign a value to a hash key, we’ll get that value back instead of the hash default the next time we try to access it.
Even when some keys have had values assigned, we’ll still get the default object for any key that hasn’t been assigned previously.
But accessing a hash value is not the same as assigning to it. If you access a hash value once and then access it again without making an assignment, you’ll still be getting the default object.
Only when a value is assigned to the hash (not just retrieved from it) will anything other than the default object be returned.
And that is why, when we try to set the type
and name
attributes of objects in the hash of planets and moons, we wind up altering the default object instead. We’re not actually assigning any values to the hash. In fact, if we inspect the hash itself, we’ll see that it’s totally empty!
Actually, those are calls to the name=
and type=
attribute writer methods on the hash default object. Don’t mistake them for assignment to the hash.
When we access a key for which no value has been assigned, we get the default object back.
The statement below is not an assignment to the hash. It attempts to access a value for the key 'Mars'
from the hash (which is still empty). Since there is no value for 'Mars'
, it gets a reference to the default object, which it then modifies.
And since there’s still nothing assigned to the hash, the next access gets a reference to the default object as well, and so on.
Fortunately, we have a solution...
We’ve determined that this code doesn’t assign a value to the hash, it just accesses a value. It gets a reference to the default object, which it then (unintentionally) modifies.
Right now, when we access a hash key for which no value has been assigned, we just get a reference to the hash default object.
What we really want is to get an entirely new object for each unassigned hash key.
Of course, if we did that without assigning to the hash, then later accesses would just keep generating new objects over and over...
So it would also be nice if the new object were assigned to the hash for us, so that later accesses would get the same object again (instead of generating new objects over and over).
Hashes have a feature that can do all of this for us!
Instead of passing an argument to Hash.new
to be used as a hash default object, you can pass a block to Hash.new
to be used as the hash default block. When a key is accessed for which no value has been assigned:
The block is called.
The block receives references to the hash and the current key as block parameters. These can be used to assign a value to the hash.
The block return value is returned as the current value of the hash key.
Those rules are a bit complex, so we’ll go over them in more detail in the next few pages. But for now, let’s take a look at your first hash default block:
If we access keys on this hash, we get separate objects for each key, just like we always intended.
Better yet, the first time we access any key, a value is automatically assigned to the hash for us!
Now that we know it will work, let’s take a closer look at the components of that block...
In most cases, you’ll want to assign the value created by your hash default block to the hash. A reference to the hash and the current key are passed to the block, in order to allow you to do so.
When we assign values to the hash in the block body, things work like we’ve been expecting all along. A new object is generated for each new key you access. On subsequent accesses, we get the same object back again, with any changes we’ve made intact.
When you access an unassigned hash key for the first time, the hash default block’s return value is returned as the value for the key.
As long as you assign a value to the key within the block body, the hash default block won’t be invoked for subsequent accesses of that key; instead, you’ll get whatever value was assigned.
Make sure the block return value matches what you’re assigning to the hash!
Otherwise, you’ll get one value when you first access the key, and a completely different value on subsequent accesses.
Generally speaking, you won’t need to work very hard to remember this rule. As we’ll see on the next page, setting up an appropriate return value for your hash default block happens quite naturally...
Thus far, we’ve been returning a value from the hash default block on a separate line:
But Ruby offers a shortcut that can reduce the amount of code in your default block a bit...
You’ve already learned that the value of the last expression in a block is treated as the block’s return value... What we haven’t mentioned is that in Ruby, the value of an assignment expression is the same as the value being assigned.
So we can use an assignment statement by itself in a hash default block, and it will return the assigned value.
And, of course, it will add the value to the hash as well.
So, in the astronomer’s hash, instead of adding a separate line with a return value, we can just let the value of the assignment expression provide the return value for the block.
Here’s our final code for the hash default block:
Here’s how the program works now:
We use a hash default block to create a unique object for each hash key. (This is unlike a hash default object, which gives references to one object as the default for all keys.)
Within the block, we assign the new object to the current hash key.
The new object becomes the value of the assignment expression, which also becomes the block’s return value. So the first time a given hash key is accessed, a new object is returned as the corresponding value.
Hash default objects work very well if you use a number as the default.
Okay, it’s a little more complicated than that. Hash default objects work very well if you don’t change the default, and if you assign values back to the hash. It’s just that numbers make it easy to follow these rules.
Take this example, which counts the number of times letters occur in an array. (It works just like the vote counting code from last chapter.)
Using a hash default object here works because we follow the above two rules...
If you’re going to use a hash default object, it’s important not to modify that object. Otherwise, you’ll get unexpected results the next time you access the default. We saw this happen when we used a default object (instead of a default block) for the astronomer’s hash, and it caused havoc:
In Ruby, doing math operations on a numeric object doesn’t modify that object; it returns an entirely new object. We can see this if we look at object IDs before and after an operation.
In fact, numeric objects are immutable: they don’t have any methods that modify the object’s state. Any operation that might change the number gives you back an entirely new object.
That’s what makes numbers safe to use as hash default objects; you can be certain that the default number won’t be changed accidentally.
Numbers make good hash default objects because they are immutable.
If you’re going to use a hash default object, it’s also important to ensure that you’re actually assigning values to the hash. As we saw with the astronomer’s hash, sometimes it can look like you’re assigning to the hash when you’re not...
When we use a number as a default object, though, it’s much more natural to actually assign values to the hash. (Because numbers are immutable, we can’t store the incremented values unless we assign them to the hash!)
That’s true. So we have a rule of thumb that will keep you out of trouble...
If your default is a number, you can use a hash default object.
If your default is anything else, you should use a hash default block.
As you gain more experience with references, all of this will become second nature, and you can break this rule of thumb when the time is right. Until then, this should prevent most problems you’ll encounter.
Understanding Ruby references and the issue of aliasing won’t help you write more powerful Ruby programs. It will help you quickly find and fix problems when they arise, however. Hopefully this chapter has helped you form a basic understanding of how references work, and will let you avoid trouble in the first place.
In the next chapter, we’re going to get back to the topic of organizing your code. You’ve already learned how to share methods between classes with inheritance. But even in situations where inheritance isn’t appropriate, Ruby offers a way to share behavior across classes: mixins. We’ll learn about those next!
3.16.75.165