Chapter 5. What’s in a Name?

Image

5.1 More About Data Types

By the end of this chapter, you will be able to read the following Perl code:

use strict;
use warnings;
my @l = qw/a b c d d a e b a b d e f/;
my %hash=();

foreach my $key (@l){
   $hash{$key} = $key;
}
print join(" ",sort keys %hash)," ";

Again, please take note that each line of code, in most of the examples throughout this book, is numbered. The output and explanations are also numbered to match the numbers in the code. When copying examples into your text editor, don’t include these numbers, or you will generate errors.

5.1.1 Basic Data Types (Scalar, Array, Hash)

In Chapter 3, “Perl Scripts,” we briefly discussed scalars. In this chapter, we will cover scalars in more depth, as well as arrays and hashes. It should be noted that Perl does not provide the traditional data types, such as int, float, double, char, and so on. It bundles all these types into one type, the scalar. A scalar can represent an integer, float, string, and so on, and can also be used to create aggregate or composite types, such as arrays and hashes.

Unlike C or Java, Perl variables don’t have to be declared before being used, and you do not have to specify what kind data will be stored there. Variables spring to life just by the mere mention of them. You can assign strings, numbers, or a combination of these to Perl variables and Perl will figure out what the type is. You may store a number or a list of numbers in a variable and then later change your mind and store a string there. Perl doesn’t care.

A scalar variable contains a single value (for example, one string or one number), an array variable contains an ordered list of values indexed by a positive number, and a hash contains an unordered set of key/value pairs indexed by a string (the key) that is associated with a corresponding value (see Figure 5.1). (See Section 5.2, “Scalars, Arrays, and Hashes.”)

Image

Figure 5.1 Namespaces for scalars, arrays, and hashes in package main.

5.1.2 Package, Scope, Privacy, and Strictness

Package and Scope

The Perl sample programs you have seen in the previous chapters are compiled internally into what is called a package, which provides a namespace for variables.

An analogy often used to describe a package is the naming of a person. In the Johnson family, there is a boy named James. James is known to his family and does not have to qualify his name with a last name every time he is being called to dinner. “James, sit down at the table” is enough. However, in the school he attends there are several boys named James. The correct James is identified by his last name, for example, “James Johnson, go to the principal’s office.”

In a Perl program, “James” represents a variable and his family name, “Johnson,” a package. The default package is called main. If you create a variable, $name, for example, $name belongs to the main package and could be identified as $main::name, but qualifying the variable at this point is unnecessary as long as we are working in a single file and using the default package, main. Later when working with modules, we will step outside of the package main. This would be like James going to school. Then we could have a conflict if two variables from different packages had the same name and would have to qualify which package they belong to. For now, we will stay in the main package. When you see the word main in a warning or error message, just be aware that it is a reference to something going on in your main package.

The scope of a variable determines where it is visible in the program. In the Perl scripts you have seen so far, the variables live in the package main and are visible to the entire script file (that is, global in scope). Global variables, also called package variables, can be changed anywhere within the current package (and other packages), and the change will permanently affect the variable. To keep variables totally hidden within their file, block, or subroutine programs, we can define lexical variables. One way Perl does this is with the my operator. An entire file can be thought of as a block, but we normally think of a block as a set of statements enclosed within curly braces. If a variable is declared as a my variable within a block, it is visible (that is, accessible within that block and any nested blocks). It is not visible outside the block. If a variable is declared with my at the file level, then the variable is visible throughout the file. See Example 5.1.

The purpose in mentioning packages and scope now is to let you know that the default scope of variables in the default main package, your script, is global; that is, accessible throughout the script. To help avoid the future problems caused by global variables, it is a good habit (and often a required practice) to keep variables private by using the my operator. This is where the strict pragma comes in.

The strict pragma (a pragma is a compiler directive) is a special Perl module that directs the compiler to abort the program if certain conditions are not met. It targets barewords, symbolic references, and global variables. For small practice scripts within a single file, using strict isn’t necessary, but it is a good, and often required, practice to use it (a topic you can expect to come up in a Perl job interview!).

In the following examples, we will use strict primarily to target global variables, causing your program to abort if you don’t use the my operator when declaring them.

The warnings and strict pragmas together are used to help you find typos, spelling errors, and global variables. Although using warnings will not cause your program to die, with strict turned on, it will, if you disobey its restrictions. With the small examples in this book, the warnings are always turned on, but we will not turn on strict until later.

5.1.3 Naming Conventions

Variables are identified by the “funny characters” that precede them. Scalar variables are preceded by a $ sign, array variables are preceded by an @ sign, and hash variables are preceded by a % sign. Since the “funny characters” (properly called sigils) indicate what type of variable you are using, you can use the same name for a scalar, array, or hash (or a function, filehandle, and so on) and not worry about a naming conflict. For example, $name, @name, and %name are all different variables; the first is a scalar, the second is an array, and the last is a hash.1

1. Using the same name is perfectly legal, but not recommended; it makes reading the program too confusing.

Since reserved words and filehandles are not preceded by a special character, variable names will not conflict with them. Names are case sensitive. The variables named $Num, $num, and $NUM are all different. If a variable starts with a letter, it may consist of any number of letters (an underscore counts as a letter) and/or digits. If the variable does not start with a letter, it must consist of only one character. Perl has a set of special variables (for example, $_, $^, $., $1, $2) that fall into this category. (See Section A.2, “Special Variables,” in Appendix A.) In special cases, variables may also be preceded with a single quote, but only when packages are used. An uninitialized variable will get a value of zero or undef, depending on whether its context is numeric or string.

5.1.4 Assignment Statements

The assignment operator, the equal sign (=), is used to assign the value on its right-hand side to a variable on its left-hand side. Any value that can be “assigned to” represents a named region of storage and is called an lvalue.2 Perl reports an error if the operand on the left-hand side of the assignment operator does not represent an lvalue.

2. The value on the left-hand side of the equal sign is called an lvalue, and the value on the right-hand side is called an rvalue.

When assigning a value or values to a variable, if the variable on the left-hand side of the equal sign is a scalar, Perl evaluates the expression on the right-hand side in a scalar context. If the variable on the left of the equal sign is an array, then Perl evaluates the expression on the right in an array or list context (see Section 5.2, “Scalars, Arrays, and Hashes”).

5.2 Scalars, Arrays, and Hashes

Now that we have discussed the basics of Perl variables (types, visibility, funny characters, and so forth), we can look at them in more depth. Perhaps a review of the quoting rules detailed in Chapter 4, “Getting a Handle on Printing,” would be helpful at this time.

5.2.1 Scalar Variables

Scalar variables hold a single number or string3 and are preceded by a dollar sign ($). Perl scalars need a preceding dollar sign whenever the variable is referenced, even when the scalar is being assigned a value.

3. References are also stored as string variables.

Assignment

When making an assignment, the value on the right-hand side of the equal sign is evaluated as a single value (that is, its context is scalar). A quoted string, then, is considered a single value even if it contains many words.

The defined Function

If a scalar has neither a valid string nor a valid numeric value, it is undefined. The defined function allows you to check for the validity of a variable’s value. It returns 1 if the variable has a value (other than undef) and nothing if it does not.

The undef Function

When you define a variable without giving it a value, such as

my $name;

the initial value is undef.

You can use the undef function to undefine an already defined variable. It releases whatever memory that was allocated for the variable. The function returns the undefined value. This function also releases storage associated with arrays and subroutines.

The $_ Scalar Variable

The $_ (called a topic variable4) is a ubiquitous little character. Although it is very useful in Perl scripts, it is often not seen, somewhat like your shadow—sometimes you see it; sometimes you don’t. It is used as the default pattern space for searches, for functions that require a scalar argument, and to hold the current line when looping through a file. Once a value is assigned to $_, functions such as chomp, split, and print will use $_ as an argument. You will learn more about functions and their arguments later, but for now, consider the following example.

4. A topic variable is a special variable with a very short name, which in many cases can be omitted.

The $_ Scalar and Reading Input from Files

When looping through a file, the $_ is often used as a holding place for each line as it is read. In the following example, a text file called datebook.txt is opened for reading. The filehandle is $fh, a user-defined variable to represent the real file, datebook.txt. Each time the loop is entered, a line is read from the file. But where does the line go? It is implicitly assigned to the $_ variable. The next time the loop is entered, a new line is read from the file and assigned to $_, overwriting the previous line stored there. The loop ends when the end of file is reached. The print function, although it appears to be printing nothing, will print the value of $_ each time the loop block is entered.

5.2.2 Arrays

Let’s say when you moved into town, you made one friend. That friend can be stored in a scalar as $friend=“John”. Now let’s say a few months have gone by since you moved, and now you have a whole bunch of new friends. In that case, you could create a list of friends, give the list one name, and store your friends in a Perl array; for example, @pals=(“John”, “Mary”, “Sanjay”, “Archie”).

When you have a collection of similar data elements, it is easier to use an array than to create a separate variable for each of the elements. The array name allows you to associate a single variable name with a list of data elements. Each of the elements in the list is referenced by its name and a subscript (also called an index).

Perl, unlike C-like languages, doesn’t care whether the elements of an array are of the same data type. They can be a mix of numbers and strings. To Perl, an array is a list containing an ordered set of scalars. The name of the array starts with an @ sign and the list is enclosed in parentheses, each element assigned an index value starting at zero (see Figure 5.2).

Image

Figure 5.2 A scalar variable and an array variable.

Assignment

If the array is initialized, the elements are enclosed in parentheses, and each element is separated by a comma. The list is parenthesized due to the lower precedence of the comma operator over the assignment operator. Elements in an array are simply scalars.

The qw construct can also be used to quote words in a list (similar to qq, q, and qx). The items in the list are treated as singly quoted words and the comma is also provided.

$pal = "John";  # Scalar holds one value
@pals = ("John", "Sam", "Nicky", "Jake" );  # Array holds a list of values
@pals = qw(John Sam Nicky Jake);  # qw means quote word and include comma

Output and Input Special Variables ($, and $“)

The $, is a special default global variable, called the output field separator. When used by the print function to print a list or an array (not enclosed in quotes), this variable separates the elements and is initially set to undef. For example, print 1,2,3 would ouput 123. Although you can assign a different value to the $, it’s not a good idea, as once changed, it will affect your whole program. (The join function would provide a better solution.)

The $” is a special scalar variable, called the list separator, used to separate the elements of a list in an array, and is by default a single space. For example, when you print an array enclosed in double quotes, the value of $” will be preserved, and you will have a space between the elements.

Array Size

$#arrayname returns the largest index value in the array; that is, the index value of its last element. Since the array indices start at zero, this value is one less than the array size. The $#arrayname variable can also be used to shorten or truncate the size of the array.

To get the size of an array, you can assign it to a scalar or use the built-in scalar function which used with an array, forces scalar context. It returns the size of the array, one value. (This is defined as a unary operator. See perlop for more details.)

The Range Operator and Array Assignment

The .. operator, called the range operator, when used in a list context, returns a list of values starting from the left value to the right value, counting by ones.

Accessing Elements

An array is an ordered list of scalars. To reference the individual elements in an array, each element (a scalar) is preceded by a dollar sign. The index starts at 0, followed by positive whole numbers. For example, in the array @colors, the first element in the array is $colors[0], the next element is $colors[1], and so forth. You can also access elements starting at the end of an array with the index value of -1 and continue downward; for example, -2, -3, and so forth.

1. To assign a list of values to an array:

@colors = qw( green red blue yellow);

2. To print the whole array, use the @:

print "@colors ";

3. To print single elements of the array:

print "$colors[0]  $colors[1] ";

4. To print more than one element (meaning, a list):

print "@colors[1,3] ";  # Now the index values are in a list,
                         # requiring the @ rather than the $ sign.

Image

Figure 5.3 Array elements.

Looping Through an Array with the foreach Loop

One of the best ways to traverse the elements of an array is with Perl’s foreach loop. (See Chapter 7, “If Only, Unconditionally, Forever,” for a thorough discussion.)

This control structure steps through each element of a list (enclosed in parentheses) using a scalar variable as a loop variable. The loop variable references, one at a time, each element in the list, and for each element, the block of statements following the list is executed. When all of the list items have been processed, the loop ends. If the loop variable is missing, $_, the default scalar, is used. You can use a named array or create a list within parentheses.

You may also see code where the word for is used instead of foreach. This is because for and foreach are synonyms. In these examples, foreach is used simply to make it clear that we are going through a list, one element at a time; that is, “for each” element in the list.

Array Copy and Slices

When you assign one array to another array, a copy is made. It’s that simple. Unlike many languages, you are not responsible for the type of data the new array will hold or how many elements it will need. Perl handles the memory allocation and the type of data that will be stored in each element of the new array.

A slice accesses several elements of a list, an array, or a hash simultaneously using a list of index values. You can use a slice to copy some elements of an array into another and also assign values to a slice. If the array on the right-hand side of the assignment operator is larger than the array on the left-hand side, the unused values are discarded. If it is smaller, the values assigned are undefined. As indicated in the following example, the array indices in the slice do not have to be consecutively numbered; each element is assigned the corresponding value from the array on the right-hand side of the assignment operator.

Multidimensional Arrays—Lists of Lists

Multidimensional arrays are sometimes called tables or matrices. They consist of rows and columns and can be represented with multiple subscripts. In a two-dimensional array, the first subscript represents the row, and the second subscript represents the column.

Perl allows this type of array, but it requires an understanding of references. We will cover this in detail in Chapter 12, “Does This Job Require a Reference?

5.2.3 Hashes—Unordered Lists

A hash (in some languages called an associative array, map, table, or dictionary) is a variable consisting of one or more pairs of scalars—either strings or numbers. Hashes are often used to create tables, complex data structures, find duplicate entries in a file or array, or to create Perl objects. We will cover objects in detail in Chapter 14, “Bless Those Things! (Object-Oriented Perl).”

Hashes are defined as an unordered list of key/value pairs, similar to a table where the keys are on the left-hand side and the values associated with those keys are on the right-hand side. The name of the hash is preceded by the % and the keys and values are separated by a =>, called the fat comma or digraph operator.

Whereas arrays are ordered lists with numeric indices starting at 0, hashes are unordered lists with string indices, called keys, stored randomly. (When you print out the hash, don’t expect to see the output ordered just as you typed it!)

To summarize, the keys in a hash must be unique. The keys need not be quoted unless they begin with a number or contain hyphens, spaces, or special characters. Since the keys are really just strings, to be safe, quoting the keys (either single or double quotes) can prevent unwanted side effects. It’s up to you. The values associated with the key can be much more complex that what we are showing here, and require an understanding of Perl references. These complex types are discussed in Chapter 12, “Does This Job Require a Reference?

my %pet = ("Name"  => "Sneaky",
           "Type"  => "cat",
           "Owner" => "Carol",
           "Color" => "yellow",
           );

So for this example, the keys and values for the hash called %pet, are as follows:

Image
Assignment

As in scalars and arrays, a hash variable must be defined before its elements can be referenced. Since a hash consists of pairs of values, indexed by the first element of each pair, if one of the elements in a pair is missing, the association of the keys and their respective values will be affected. When assigning keys and values, make sure you have a key associated with its corresponding value. When indexing a hash, curly braces are used instead of square brackets.

Accessing Hash Values

When accessing the values of a hash, the subscript or index consists of the key enclosed in curly braces. Perl provides a set of functions to list the keys, values, and each of the elements of the hash.

Due to the internal hashing techniques used to store the keys, Perl does not guarantee the order in which an entire hash is printed.

Hash Slices

A hash slice is a list of hash keys. The hash name is preceded by the @ symbol and assigned a list of hash keys enclosed in curly braces. The hash slice lets you access one or more hash elements in one statement, rather than by going through a loop.

Removing Duplicates from a List Using a Hash

Because all keys in a hash must be unique, one way to remove duplicates from a list, whether an array or file, is to list items as keys in a hash. The values can be used to keep track of the number of duplicates or simply left undefined. The keys of the new hash will contain no duplicates. See the section, “The map Function,” later in this chapter, for more examples.

5.2.4 Complex Data Structures

By combining arrays and hashes, you can make more complex data structures, such as arrays of hashes, hashes with nested hashes, arrays of arrays, and so on. Here is an example of an array of arrays requiring references.

my $matrix = [
               [ 0, 2, 4 ],
               [ 4, 1, 32 ],
               [ 12, 15, 17 ]
             ] ;

To create these structures, you should have an understanding of how Perl references and complex data structures are used. (See Chapter 12, “Does This Job Require a Reference?”)

5.3 Array Functions

Arrays can grow and shrink. The Perl array functions allow you to insert or delete elements of the array from the front, middle, or end of the list, to sort arrays, perform calculations on elements, to search for patterns, and more.

5.3.1 Adding Elements to an Array

The push Function

The push function pushes values onto the end of an array, thereby increasing the length of the array (see Figure 5.5).

Image

Figure 5.5 Adding elements to an array.

The unshift Function

The unshift function prepends LIST to the front of the array (see Figure 5.6).

Image

Figure 5.6 Using the unshift function to add elements to the beginning of an array.

5.3.2 Removing and Replacing Elements

The delete Function

If you have a row of shoeboxes and take a pair of shoes from one of the boxes, the number of shoeboxes remains the same, but one of them is now empty. That is how delete works with arrays. The delete function allows you to remove a value from an element of an array, but not the element itself. The value deleted is simply undefined. (See Figure 5.7.) But if you find it in older programs, perldoc.perl.org warns not to use it for arrays, but rather for deleting elements from a hash. In fact, perldoc.perl.org warns that calling delete on array values is deprecated and likely to be removed in a future version of Perl.

Image

Figure 5.7 Using the delete function to remove elements from an array.

Instead, use the splice function to delete and replace elements from an array, while at the same time renumbering the index values.

The splice Function

For the delete function, we described a row of shoeboxes in which a pair of shoes was removed from one of the boxes, but the box itself remained in the row. With splice, the box and its shoes can be removed and the remaining boxes pushed into place. (See Figure 5.8.) We could even take out a pair of shoes and replace them with a different pair (see Figure 5.9), or add a new box of shoes anywhere in the row. Put simply, the splice function removes and replaces elements in an array. The OFFSET is the starting position where elements are to be removed. The LENGTH is the number of items from the OFFSET position to be removed. The LIST consists of an optional new elements that are to replace the old ones. All index values are renumbered for the new array.

Image

Figure 5.8 Using the splice function to remove or replace elements in an array.

Image

Figure 5.9 Splicing and replacing elements in an array.

The pop Function

The pop function pops off the last element of an array and returns it. The array size is subsequently decreased by one. (See Figure 5.10.)

Image

Figure 5.10 Using the pop function to pop the last element off the array.

The shift Function

The shift function shifts off and returns the first element of an array, decreasing the size of the array by one element. (See Figure 5.11.) If ARRAY is omitted, then the @ARGV array is shifted. If in a subroutine, the argument list, stored in the @_ array is shifted.

Image

Figure 5.11 Using the shift function to return the first element of an array.

5.3.3 Deleting Newlines

The chop and chomp Functions (with Lists)

The chop function chops off the last character of a string and returns the chopped character, usually for removing the newline after input is assigned to a scalar variable. If a list is chopped, chop will remove the last letter of each string in the list.

The chomp function removes a newline character at the end of a string or for each element in a list.

5.3.4 Searching for Elements and Index Values

The grep Function

The grep function is similar to the UNIX grep command in that it searches for patterns of characters, called regular expressions. However, unlike the UNIX grep, it is not limited to using regular expressions. Perl’s grep evaluates the expression (EXPR) for each element of the array (LIST), locally setting $_ to each element. The return value is another array consisting of those elements for which the expression evaluated as true. As a scalar value, the return value is the number of times the expression was true (that is, the number of times the pattern was found).

The next example shows you how to find the index value(s) for specific elements in an array using the built-in grep function. (If you have version 5.10+, you may want to use the more efficient List::MoreUtils module from the standard Perl libaray, or from CPAN.)

5.3.5 Creating a List from a Scalar

The split Function

The split function splits up a string (EXPR) by some delimiter (whitespace, by default) and returns a list. (See Figure 5.12.) The first argument is the delimiter, and the second is the string to be split. The Perl split function can be used to create fields when processing files, just as you would with the UNIX awk command. If a string is not supplied as the expression, the $_ string is split.

The DELIMITER statement matches the delimiters that are used to separate the fields. If DELIMITER is omitted, the delimiter defaults to whitespace (spaces, tabs, or newlines). If the DELIMITER doesn’t match a delimiter, split returns the original string. You can specify more than one delimiter, using the regular expression metacharacter [ ]. For example, [ + :] represents zero or more spaces or a tab or a colon.

To split on a dot (.), use /./ to escape the dot from its regular expression metacharacter.

LIMIT specifies the number of fields that can be split. If there are more than LIMIT fields, the remaining fields will all be part of the last one. If the LIMIT is omitted, the split function has its own LIMIT, which is one more than the number of fields in EXPR. (See the -a switch for autosplit mode, in Appendix A, “Perl Built-ins, Pragmas, Modules, and the Debugger.”)

Image

Figure 5.12 Using the split function to create an array from a scalar.

5.3.6 Creating a Scalar from a List

The join Function

The join function joins the elements of an array into a single string and separates each element of the array with a given delimiter, sometimes called the “glue” character(s) since it glues together the items in a list (opposite of split). (See Figure 5.13.) The expression DELIMITER is the value of the string that will join the array elements in LIST.

Image

Figure 5.13 Using the join function to join elements of an array with a comma.

5.3.7 Transforming an Array

The map Function

If you have an array and want to perform the same action on each element of the array without using a for loop, the map function may be an option. The map function maps each of the values in an array to an expression or block, returning another list with the results of the mapping. It lets you change the values of the original list.

Using map to Change All Elements of an Array

In the following example, the chr function is applied or mapped to each element of an array and returns a new array showing the results. (See Figure 5.14.)

Image

Figure 5.14 Using the map function to change elements in an array.

Using map to Remove Duplicates from an Array

The map function can be used to create a hash from an array. If you are using the array elements as keys for the new hash, any duplicates will be eliminated.

5.3.8 Sorting an Array

The sort Function

The sort function sorts and returns a sorted list. Its default is to sort alphabetically, but you can define how you want to sort by using different comparison operators. If SUBROUTINE is specified, the first argument to sort is the name of the subroutine, followed by a list of values to be sorted. If the string cmp operator is used, the values in the list will be sorted alphabetically (ASCII sort), and if the <=> operator (called the space ship operator) is used, the values will be sorted numerically. The values are passed to the subroutine by reference and are received by the special Perl variables $a and $b, not the normal @_ array. (See Chapter 11, “How Do Subroutines Function?” for further discussion.) Do not try to modify $a or $b, as they represent the values that are being sorted.

If you want Perl to sort your data according to a particular locale, your program should include the use locale pragma. For a complete discussion, see perldoc.perl.org/perllocale.

ASCII and Numeric Sort Using Subroutine

You can either define a subroutine or use an inline function to perform customized sorting, as shown in the following examples. A note about $a and $b: they are special global Perl variables used by the sort function for comparing values. If you need more information on the operators used, see Chapter 6, “Where’s the Operator?

5.3.9 Checking the Existence of an Array Index Value

The exists Function

The exists function returns true if an array index (or hash key) has been defined, and false if it has not. It is most commonly used when testing a hash key’s existence.

5.3.10 Reversing an Array

The reverse Function

The reverse function reverses the elements in a list, so that if the values appeared in descending order, now they are in ascending order, or vice versa. In scalar context, it concatenates the list elements and returns a string with all the characters reversed; for example, in scalar context Hello, there! reverses to !ereht ,olleH.

5.4 Hash (Associative Array) Functions

5.4.1 The keys Function

The keys function returns, in random order, an array whose elements are the keys of a hash (see also Section 5.4.2, “The values Function,” and Section 5.4.3, “The each Function”). Starting with Perl 5.12, keys also returns the index values of an array. In scalar context, it returns the number of keys (or indices).

5.4.2 The values Function

The values function returns, in random order, a list consisting of all the values of a named hash. (After Perl 5.12, it will also return the values of an array.) In scalar context, it returns the number of values.

Since hashes are stored in a random order, to get the hash values in the order in which they were assigned, you can use a hash slice as shown in the following example.

5.4.3 The each Function

The each function returns, in random order, a two-element list whose elements are the key and the corresponding value of a hash. It must be called multiple times to get each key/value pair, as it only returns one set each time it is called, somewhat like reading lines from a file, one at a time.

5.4.4 Removing Duplicates from a List with a Hash

Earlier, we used a hash to remove duplicate entries in an array. In the following example, the built-in map function is used to map each element of an array into a hash to create unique hash keys.

5.4.5 Sorting a Hash by Keys and Values

When sorting a hash, you can sort the keys alphabetically very easily by using the built-in sort command, as we did with arrays in the preceding section. But you may want to sort the keys numerically or sort the hash by its values. To do this requires a little more work.

You can define a subroutine to compare the keys or values. (See Chapter 11, “How Do Subroutines Function?”) The subroutine will be called by the built-in sort function. It will be sent a list of keys or values to be compared. The comparison is either an ASCII (alphabetic) or a numeric comparison, depending upon the operator used. The cmp operator is used for comparing strings, and the <=> operator is used for comparing numbers. The reserved global scalars $a, and $b are used in the subroutine to hold the values as they are being compared. The names of these scalars cannot be changed.

Sort Hash by Keys in Ascending Order

To perform an ASCII, or alphabetic, sort on the keys in a hash is relatively easy. Perl’s sort function is given a list of keys and returns them sorted in ascending order. A foreach loop is used to loop through the hash keys, one key at a time.

Sort Hash by Keys in Reverse Order

To sort a hash by keys alphabetically and in descending order, just add the built-in reverse function to the previous example. The foreach loop is used to get each key from the hash, one at a time, after the reversed sort.

Sort Hash by Keys Numerically

A user-defined subroutine is used to sort a hash by keys numerically. In the subroutine, Perl’s special $a and $b variables are used to hold the value being compared with the appropriate operator. For numeric comparison, the <=> operator is used, and for string comparison, the cmp operator is used. The sort function will send a list of keys to the user-defined subroutine. The sorted list is returned.

Numerically Sort a Hash by Values in Ascending Order

To sort a hash by its values, a user-defined function is also defined. The values of the hash are compared by the special variables $a and $b. If $a is on the left-hand side of the comparison operator, the sort is in ascending order, and if $b is on the left-hand side, then the sort is in descending order. The <=> operator compares its operands numerically.

Numerically Sort a Hash by Values in Descending Order

To sort a hash numerically and in descending order by its values, a user-defined function is created as in the previous example. However, this time the $b variable is on the left-hand side of the <=> numeric operator, and the $a variable is on the right-hand side. This causes the sort function to sort in descending order.

5.4.6 The delete Function

The delete function deletes a specified element from a hash. The deleted value is returned if successful.5

5. If a value in an %ENV hash is deleted, the environment is changed. (See “The %ENV Hash” on page 137.)

5.4.7 The exists Function

The exists function returns true if a hash key (or array index) exists, and false if not.

5.4.8 Special Hashes

The %ENV Hash

The %ENV hash contains the environment variables handed to Perl from the parent process; for example, a shell or a Web server. The key is the name of the environment variable, and the value is what was assigned to it. If you change the value of %ENV, you will alter the environment for your Perl script and any processes spawned from it, but not the parent process. Environment variables play a significant roll in CGI Perl scripts.

The %SIG Hash

The %SIG hash allows you to set signal handlers for signals. If, for example, you press <CTRL>+C when your program is running, that is a signal, identified by the name SIGINT. (See UNIX manual pages for a complete list of signals.) The default action of SIGINT is to interrupt your process. The signal handler is a subroutine that is automatically called when a signal is sent to the process. Normally, the handler is used to perform a clean-up operation or to check some flag value before the script aborts. (All signal handlers are assumed to be set in the main package.)

The %SIG hash contains values only for signals set within the Perl script.

The %INC Hash

The %INC hash contains the entries for each filename that has been included via the use or require functions. The key is the filename; the value is the location of the actual file found.

5.4.9 Context Revisited

In summary, the way Perl evaluates variables depends on how the variables are being used; they are evaluated by context, either scalar, list, or void.

If the value on the left-hand side of an assignment statement is a scalar, the expression on the right-hand side is evaluated in a scalar context; whereas if the value on the left-hand side is an array, the right-hand side is evaluated in a list context.

Void context is a special form of scalar context. It is defined by the Perl monks as a “context that doesn’t have an operator working on it. The value of a thing in void context is discarded, not used for anything...” An example of void context is when you assign a list to a scalar separating the elements with a comma. The comma operator evaluates its left argument in void context, throws it away, then evaluates the right argument, and so on, until it reaches the end of the list, discarding all but the last one.

$fruit = ("apple","pear","peach");  # $fruit is assigned "peach";
                                    # "apple" and "pear" are discarded
                                    # as useless use in void context

You’ll see examples throughout the rest of this book where context plays a major role.

5.5 What You Should Know

1. If you don’t give a variable a value, what will Perl assign to it?

2. What are “funny characters”? What is a sigil?

3. What data types are interpreted within double quotes?

4. How many numbers or strings can you store in a scalar variable?

5. In a hash, can you have more than one key with the same name? What about more than one value with the same name?

6. What function would you use to find the index value of an array if you know the value of the data stored there?

7. How does the scalar function evaluate an expression if it’s an array?

8. How do you find the size of an array?

9. What does the $” special variable do?

10. When are elements of an array or hash preceded by a $ (dollar sign)?

11. What is the difference between chop and chomp?

12. What is the difference between splice and slice?

13. What does the map function do?

14. How do you sort a numeric array? How do you sort a hash by value?

15. What function extracts both keys and values from a hash?

16. How can you remove duplicates in an array?

17. What is meant by the term scope?

18. What is “scalar” context, “list” context, “void” context? Would you be able to write an example to demonstrate how they differ?

5.6 What’s Next?

In the next chapter, we discuss the Perl operators. We will cover the different types of assignment operators, comparison and logical operators, arithmetic and bitwise operators, how Perl sees strings and numbers, how to create a range of numbers, how to generate random numbers, and some special string functions.

Exercise 5: The Funny Characters

1. Write a script that will ask the user for his five favorite foods (read from STDIN). The foods will be stored as a string in a scalar, each food separated by a comma.

a. Split the scalar by the comma and create an array.

b. Print the array.

c. Print the first and last elements of the array.

d. Print the number of elements in the array.

e. Use an array slice of three elements in the food array and assign those values to another array. Print the new array with spaces between each of the elements.

2. Given the array @names=qw(Nick Susan Chet Dolly Bill), write a statement that would do the following:

a. Replace Susan and Chet with Ellie, Beatrice, and Charles.

b. Remove Bill from the array.

c. Add Lewis and Izzy to the end of the array.

d. Remove Nick from the beginning of the array.

e. Reverse the array.

f. Add Archie to the beginning of the array.

g. Sort the array.

h. Remove Chet and Dolly and replace them with Christian and Daniel.

3. Write a script called elective that will contain a hash. The keys will be code numbers—2CPR2B, 1UNX1B, 3SH414, 4PL400. The values will be course names—C Language, Intro to UNIX, Shell Programming, Perl Programming.

a. Sort the hash by values and print it.

b. Ask the user to type the code number for the course he plans to take this semester and print a line resembling the following:

You will be taking Shell Programming this semester.

4. Modify your elective script to produce output resembling the output below. The user will be asked to enter registration information and to select an EDP number from a menu. The course name will be printed. It doesn’t matter if the user types in the EDP number with upper- or lowercase letters. A message will confirm the user’s address and thank him for enrolling.

Output should resemble the following:

REGISTRATION INFORMATION FOR SPRING QUARTER

Today’s date is Wed Apr 19 17:40:19 PDT 2014

Please enter the following information:

Your full name: Fred Z. Stachelin

What is your Social Security Number (xxx-xx-xxxx): 004-34-1234

Your address:

     StreetHobartSt

     CityStateZipChicoCA

“EDP” NUMBERS AND ELECTIVES:

—————————————————————————————————————

2CPR2B | C Programming

—————————————————————————————————————

1UNX1B | Intro to UNIX

—————————————————————————————————————

4PL400  | Perl Programming

—————————————————————————————————————

3SH414  | Shell Programming

—————————————————————————————————————

What is the EDP number of the course you wish to take? 4pl400
The course you will be taking is “Perl Programming.”

Registration confirmation will be sent to your address at

     1424 HOBART ST.

     CHICO, CA 95926

Thank you, Fred, for enrolling.

5. Write a script called findem that will do the following:

a. Assign the contents of the datebook file to an array. (The datebook file is on the CD that accompanies this book.)

b. Ask the user for the name of a person to find. Use the built-in grep function to find the elements of the array that contain the person and number of times that person is found in the array. The search will ignore case.

c. Use the split function to get the current phone number.

d. Use the splice function to replace the current phone number with the new phone number, or use any of the other built-in array functions to produce output that resembles the following:

Who are you searching for? Karen

What is the new phone number for Karen? 530-222-1255

Karen’s phone number is currently 284-758-2857.

Here is the line showing the new phone number:

Karen Evich:530-222-1255:23 Edgecliff Place, Lincoln, NB 92086:7/25/53:85100

Karen was found in the array three times.

6. Write a script called tellme that will print out the names, phones, and salaries of all the people in the datebook file. To execute, type the following at the command line:

tellme datebook

Output should resemble the following:

Salary: 14500
Name:  Betty Boop
Phone: 245-836-8357

7. The following array contains a list of values with duplicates.

@animals=qw( cat dog bird cat bird monkey elephant cat elephant pig horse cat);

a. Remove the duplicates with the built-in map function.

b. Sort the list.

c. Use the built-in grep function to get the index value for the monkey.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.34.62