Chapter 12. Does This Job Require a Reference?

Image

By the time you finish this chapter, you will understand the following Perl statements:

use Data::Dumper;
use warnings;
use strict;
my $student={ "Name"=>undef,
              "Major"=>undef,
              "Courses"=>[],
              "Stats"=>{},
            };
$student->{"Courses"}=[ qw( French Algebra Chemistry ) ];
$student->{"Stats"}->{"PointAve"}=3.5;
$student->{"Stats"}->{"StartDate"}="09/17/12";
print Dumper $student;

12.1 What Is a Reference?

You have a post office box where you receive mail. The address on a letter contains a reference to your mailbox. The postman goes to that address and puts the letter in the box. You go to your mailbox address and pull out the letter. If you didn’t have the mailbox, the postman would have to hand you the letter directly. Instead, he did it indirectly by putting it in the box. That’s a simplified attempt to explain references.

A Perl reference is a variable that refers to another one. In short, it contains the address of another variable. The terms reference and pointer are often used interchangeably in Perl, because they both point to something, but they are not really the same, and those in Perl circles avoid using the term pointer. The main difference is that pointers in other languages contain the integer value of a memory address (for a specified data type) which you can directly manipulate. For example, with a C pointer, you can jump from element to element in an array of integers by performing pointer arithmetic, such as p++, where p contains the memory address of an array of ints. However, you can’t do that with Perl references. As you know, Perl variables can hold any data type at any given time. We don’t declare ints, floats, and the like. Perl handles all that. (To see how C, C++, Java, and other languages handle pointers and to get a really good introduction to pointers in general, Stanford provides a simple video called “Binky Fun With Pointers” found at http://cslibrary.stanford.edu/104/.)

Unlike pointers, Perl references are data structures that are displayed as strings, not integers. They contain the data type and the hexadecimal address of the variable they reference; for example, here is a reference to a scalar variable: SCALAR(0xb057c). Unlike C, Perl keeps track of managing memory and of reference counts, and when there are no more references to the data, then Perl will automatically destroy the data. But, as mentioned earlier, because both references and pointers do point to something, the terms are often used interchangeably.

When you create a Perl reference, it is stored in a scalar variable. Now the big question is, “What’s the point? Why do we need references?” There are three good reasons to use references:

• To pass arguments by reference to subroutines

• To create complex data structures, such as a hash of hashes, an array of arrays, a hash consisting of nested hashes, arrays, subroutines, and so forth

• To create Perl objects, as shown in the next chapter

Perl has two types of references: hard references and symbolic references. The hard references were introduced with Perl 5. Before hard references, typeglob aliases were used, but were of limited usefulness, other than for manipulating the internal symbol table (see Section 13.1.4, “The Symbol Table,” in Chapter 13, “Modularize It, Package It, and Send It to the Library!”). A symbolic reference is when a variable holds the name of another variable and is also of limited use (http://perlmaven.com/symbolic-reference-in-perl). Although this chapter focuses on hard references, we will include a discussion of symbolic references and typeglobs at the end of the chapter.

12.1.1 Hard References

A hard reference is a scalar that holds the address of another variable or subroutine. It is an indirect way to access a variable. Perl references not only contain the hexadecimal address, but the data type:

ARRAY(0x7f9241004ee8)

The reference can point to (reference) a scalar, an array, a hash, a subroutine, a typeglob, another reference, and so forth.

The Backslash Operator

The backslash unary operator is used to create a reference, similar to the & used in C to get the “address of.” In the following example, $p is a scalar that is assigned a reference to $x.

$x = "Tom";
$p = $x;   # $p gets the memory address of $x

Examples of hard references from the Perl man page perlref include the following:

$scalarref = $foo;        # reference to scalar $foo
$arrayref  = @ARGV;       # reference to array @ARGV
$hashref   = \%ENV;        # reference to hash %ENV
$coderef   = &handler;    # reference to subroutine handler
$globref   = *STDOUT;     # reference to typeglob STDOUT
$reftoref  = $scalarref;  # reference to another reference
                            (pointer to pointer, ugh)

Dereferencing the Pointer

If you print the value of a reference, you will see a data type and a hexadecimal address.

@list = qw(Tom Dick Harry);
$ref = @list;  # $ref contains the data type and memory address of @list

Image

Figure 12.1 $ref contains the address of @list.

If you want to go to the address that $ref points to (that is, the address of @list), and get the values stored there, you would say:

print @{$ref}   # prints Tom Dick and Harry

This is called dereferencing the pointer. Notice the $ref (the address) is prepended with the @ sign. That tells Perl to get the array values from @list. (Although the curly braces aren’t necessary in this example, they will be later on in more complex examples.) The first is the dollar sign, because the reference itself is a scalar, $ref, and then preceding that goes the sigil, representing the type of data it references. In the following examples, we will get into much more detail about how to use references.

12.1.2 References and Anonymous Variables

It is not necessary to name a variable to create a reference to it. If an array, hash, or subroutine has no name, it is called anonymous. If, for example, an array has no name and its data is assigned to a reference variable, you can use the reference to assign and fetch data from the anonymous array.

Use the arrow operator (->), called an infix operator, to dereference a reference to anonymous arrays, hashes, and subroutines. Although not really necessary, the arrow operator makes the program easier to read.

Anonymous Arrays

Enclose anonymous array elements in square brackets ([ ]). These square brackets are not to be confused with the square brackets used to subscript an array. They represent the address of an unnamed array. The brackets will not be interpolated if enclosed within quotes. Use the arrow (infix) operator to get the individual elements of the array.

Anonymous Hashes

An anonymous hash is a hash without a name. Create a reference to it by using curly braces ({}). You can mix array and hash composers to produce complex data types. These braces are not the same braces that are used when subscripting a hash. The anonymous hash is assigned to a scalar reference.

12.1.3 Nested Data Structures

The ability to create references to anonymous data structures lends itself to more complex types. For example, you can have hashes nested in hashes, or arrays of hashes, or arrays of arrays, and so forth.

Just as with simpler references, you dereference the anonymous data structures by prepending the reference with the correct funny symbol (sigil) that represents its data type. For example, if $p is a pointer to a scalar, you can write $$p to dereference the scalar reference, and if $p is a pointer to an array, you can write @$p to dereference the array reference or $$p[0] to get the first element of the array. You can also dereference a reference by treating it as a block. You could write $$p[0] as ${$p}[0] or @{p}[0..3]. Sometimes, you use the braces to prevent ambiguity, and sometimes they are necessary so that the funny character dereferences the correct part of the structure.

Using Data::Dumper

Now that we are getting ready to create more complex data structures with Perl references, this is a good time to take a moment to talk about the Data::Dumper module. This module, found in the standard Perl library, makes it easy for you to see the contents of nested hashes, arrays, and combinations of these.

Array of Lists

An array may contain another list or set of lists, most commonly used to create a multidimensional array. Each row in square brackets is reference to an anonymous array.

A Reference to a List of Lists

In the following example, a reference points to an anonymous list of lists. Notice that in this example, $matrix is a reference to an anonymous array in square brackets, whereas in the previous example, @matrix is a named array with parentheses to contain its values. This distinction is important as it is a common error to use [ ] when one should use ( ), and vice versa.

Array of Hashes

A list may contain a hash or references to hashes. In Example 12.8, a reference is assigned an anonymous array containing two anonymous hash references.

Hash of Hashes

A hash may contain another hash or a set of hash references. In Example 12.9, a reference is assigned an anonymous hash reference consisting of two keys, each of which is associated with a value that happens to be another hash reference (consisting of its own key/value pairs).

12.1.4 More Nested Structures

A hash may contain nested hash keys associated with lists of values. In Example 12.10, a named hash has two keys whose values are references to another set of nested key/value pairs. And the values for the nested keys are references to arrays. Argh!

In Example 12.11, a reference is assigned the address of an anonymous hash (has no name). It also has two keys whose values are another set of key/value pairs.

The only difference between Example 12.10 and Example 12.11 is that in Example 12.10 the definition of a named hash, %profession, is enclosed in parentheses, and consists of nested key/value pairs; in Example 12.11, a reference to a nameless hash is defined and enclosed in curly braces, not parentheses. Also, when extracting the values, the syntax is different when using a named hash versus a reference to an unnamed hash. When using these nested structures, the syntax can get confusing. Use Data::Dumper to help you see what kind of a monster you have created!

12.1.5 References and Subroutines

Anonymous Subroutines

A reference to an anonymous subroutine is created by using the keyword sub without a subroutine name. The expression is terminated with a semicolon. For more on using anonymous subroutines, see Section 14.3.1, “What Is a Closure?

Subroutines and Passing by Reference

When passing arguments to subroutines, they are sent to the subroutine and stored in the @_ array. If you have a number of arguments—say an array, a scalar, and another array—the arguments are all flattened out onto the @_ array. It would be hard to tell where one argument ended and the other began unless you also passed along the size of each of the arrays, and then the size would be pushed onto the @_ array and you would have to get that to determine where the first array ended, and so on. The @_ could also be quite large if you are passing a 1,000-element array. So, the easiest and most efficient way to pass arguments is by reference, as shown in Example 12.13.

12.1.6 The ref Function

The ref function is used to test for the existence of a reference. It returns a non-empty string if its argument is a reference and with no argument, $_ is used. The string returned is the type of data the reference points to; for example, SCALAR is returned if the reference points to a scalar, and ARRAY is returned if it points to an array. If the argument is not a reference, the empty string is returned. Table 12.1 lists the values returned by the ref function.

Image

Table 12.1 Return Values* from the ref Function

12.1.7 Symbolic References

A hard reference is a scalar variable that holds the address of another type of data. This chapter focused on hard references. This is an example of the value of a hard reference:

ARRAY(0x7f9241004ee8)

A symbolic reference names another variable rather than just pointing to a value; that is, it doesn’t contain the data type and address. Their use is discouraged because they cannot be lexically scoped, and will not get past strict if you have it turned on. You would see something like this: Global symbol "$animal" requires explicit package name at symbolicref line 3.

Example 12.16 demonstrates a symbolic reference where the value of one variable references the name of another variable. For more on symbolic references, see http://perlmaven.com/symbolic-reference-in-perl.

The strict Pragma

To protect yourself from inadvertently using symbolic references in a program, use the strict pragma with the refs argument. This restricts the use of symbolic references in your program. Here, we re-execute the previous example using the strict pragma.

12.1.8 Typeglobs (Aliases)

Typeglobs are an internal type that Perl uses to create a symbol table, containing the namespace entries for a package; for example, the package we have been working in for all the examples thus far is called main and provides a namespace for all of its identifiers (except those preceded by the my operator). This namespace is created as a hash using typeglobs. We will discuss symbol tables and typeglobs in Chapter 13, “Modularize It, Package It, and Send It to the Library!

Before Perl 5, typeglobs were used to create aliases, mainly for the purpose of passing arrays and hashes to functions by reference, but now that we have hard references, they are seldom used for that purpose.

Typeglobs are identifier names preceded by an *. They are a type of reference or alias. You can think of a typeglob as a way for Perl to glob onto data types; for example, *x is a typeglob. You could say it globs onto all data types named x, such as @x, %x, $x, sub x, and so on. Be careful not to confuse this with the glob function used with the shell globbing metacharacters such as the *, ?, and [ ] and used in filename expansion; see Section 16.3.5, Globbing (Filename Expansion and Wildcards).”

Another example of a typeglob is found in modern Perl when you create a lexical filehandle as we did in Chapter 10, “Getting a Handle on Files.” As you can see in the output of the following example, the filehandle is stored as a GLOB at some address, making it a type of reference.

open($fh, "<", "datebook") or die $!
print $fh;
GLOB(0x7fd6c2004ee8)

EXAMPLE 12.20 illustrates how typeglobs were used in the early days of Perl to pass arguments to subroutines as aliases. For an excellent discussion on typeglobs, see Chapter 3 of Advanced Perl Programming, First Edition by Sriram Srinivasan (O’Reilly, 1998).

Filehandle References and Typeglobs

One of the only ways to pass a bareword filehandle to a subroutine is by reference. You can use a typeglob to create an alias for the filehandle and then use the backslash to create a reference to the typeglob. Wow.

If you are using the modern lexical filehandles, such as $fh, which is really just a reference to a typeglob, you don’t need to use typeglobs at all. But if you are using the older style of creating filehandles with barewords or STDIN, STDOUT, STDERR, and the like, then you can use a typeglob to create a symbolic reference.

12.2 What You Should Know

1. What is the difference between a symbolic and a hard reference?

2. What is a typeglob?

3. How do you create a reference to a hash?

4. How can you tell an anonymous array from a named array?

5. How do you dereference $ref where $ref = { 'Name' => "John"; }?

6. How do you dereference $p where $p = $x;?

7. What is meant by a nested hash?

8. How do you create a two-dimensional array?

9. What is the advantage of passing by reference?

10. What is the purpose of the ref function?

11. How would you create an array of hashes using a reference?

12.3 What’s Next?

Next, we will expand your horizons and go from the “introverted” Perl programmer to the “extroverted” programmer. Instead of writing stand-alone scripts, you will start learning how to use the libraries and modules already provided by Perl. You will explore CPAN and learn how to download and use modules that other programmers have written.

You will understand packages and namespaces and how to export and import symbols, how to use the standard Perl library, and how to create your own. You will learn how to create procedural modules and how to store and use them.

Exercise 12: It’s Not Polite to Point!

1. Rewrite tripper (from Chapter 11) to take two references as arguments and copy the arguments from the @_ in the subroutine into two my variables.

2. Create a hash named employees with the following three keys:

Name

Ssn

Salary

3. The values will be assigned as undefined (undef is a built-in Perl function). For example: Name => undef,

a. Create a reference to the hash.

b. Assign values to each of the keys using the reference.

c. Print the keys and values of the hash using the built-in each function and the reference.

d. Print the value of the reference; in other words, what the reference variable contains, not what it points to.

4. Rewrite the exercise so the hash is anonymous, and assign the anonymous hash to a reference. Delete one of the keys from the hash using the reference (use the delete function).

5. Write a program that will contain the following structure:

$student = {    Name   => undef,
                SSN     => undef,
                Friends => [],
                Grades  => {   Science => [],
                Math    => [],
                English => [],
                              }
                };

Use the reference to assign and display output resembling the following:

Name is John Smith.
Social Security number is 510-23-1232.
Friends are Tom, Bert, Nick.
Grades are:
              Science--100, 83, 77
              Math--90, 89, 85
              English--76, 77, 65

6. Write a program that contains a reference to an anonymous subroutine. Call the subroutine passing the hash you created in Exercise 4. The subroutine will display the hash sorted by keys.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.137.67