Pattern Matching Odds and Ends

Now that you can match patterns against $_ and you know the basics of substitution, you're ready for more functionality. To be really effective with regular expressions, you need to match against variables other than $_, be able to do sophisticated substitutions, and work with Perl's functions that are geared toward—but not exclusive to—regular expressions.

Working with Other Variables

In Listing 6.2, the weight gathered from the user is stored in $_ and manipulated with substitution operators and matching operators. This listing does have a problem, however: $_ isn't exactly the best variable name to store "weight" in. It's not very intuitive for starters, and $_ might get altered when you least expect it.

Caution

In general, storing anything in $_ for long is playing with fire; eventually, you will get burned. Many of Perl's operators use $_ as a default argument, and some of them modify $_ as well. $_ is Perl's general-purpose variable, and trying to keep a value in $_ for very long (especially after what you learn in Hour 8, "Functions") will cause bugs eventually.


Using a variable called $weight would have been better in Listing 6.2. To use the match operator and substitution operator against variables other than $_, you must bind them to the variable. You do so by using the binding operator, =~, as shown here:

$weight="185 lbs";
$weight=~s/ lbs//;      # Do substitution against $weight

The =~ operator doesn't make assignments; it merely takes the operator on the right and causes it to act on the variable to the left. The entire expression has the same value as it would if $_ were used, as you can see in this example:

$poem="One fish, two fish, red fish";
$n=$poem=~m/fish/;      # $n is true, if $poem has fish

Modifiers and Multiple Matching

Until now, all the regular expressions you've seen have been case sensitive. That is, upper- and lowercase characters are distinct in a pattern match. To match words and not care about whether they're in upper- or lowercase would require something like this:

/[Mm][Aa][Cc][Bb][Ee][Tt][Hh]/;

This example doesn't just look silly; it's error prone because it would be really easy to mistype an upper-/lowercase pair. The substitution operator (s///) and the match operator (m//) can match regular expressions regardless of case if followed with the letter i:

/macbeth/i;

The preceding example matches Macbeth in uppercase, lowercase, or mixed case (MaCbEtH).

Another modifier for matches and substitutions is the global-match modifier, g. The regular expression (or substitution) is done not just once, but repeatedly through the entire string, each match (or substitution) taking place starting immediately after the first one.

In a list context, the global-match modifier causes the match to return a list of all the portions of the regular expression that are in parentheses:

$_="One fish, two frog, red fred, blue foul";
@F=m/W(fwww)/g;

The pattern matches a nonword character, then the letter f, followed by four word characters. The f and four word characters are grouped by parentheses. After the expression is evaluated, the variable @F will contain four elements: fish, frog, fred, and foul.

In a scalar context, the g modifier causes the match to iterate through the string, returning true for each match and false when no more matches are made. Now consider the following:

$letters=0;
$phrase="What's my line?";
while($phrase=~/w/g) {
    $letters++;
}

The preceding snippet uses the match operator (//) with a g modifier in a scalar context; the condition of while provides the scalar context. The pattern matches a word character. The while loop continues (and $letters gets incremented) until the match returns false. When the snippet is all done, $letters will be 11.

Note

You'll find much more efficient ways of counting characters than this presented in Hour 9, "More Functions and Operators."


Backreferences

When you use parentheses in Perl's regular expressions, the portion of the target string matched by each parenthesized expression is remembered. Perl remembers this matched text in special variables named $1 (for the first set of parentheses), $2 (for the second), $3, $4, and so on. Now check out the following example.

The preceding pattern matches well-formed U.S./Canadian telephone numbers—for example, 800-555-1212—and remembers each portion in $1, $2, and $3. These variables can be used after the following expression:

if (/(d{3})-(d{3})-(d{4})/) {
    print "The area code is $1";
}

Or they can be used as part of the replacement text in a substitution, as follows:

s/(d{3})-(d{3})-(d{4})/Area code $1 Phone $2–$3/;

Be careful, however; the variables $1, $2, and $3 are reset every time a pattern match is successfully performed (regardless of whether it uses parentheses), and the variables are set if and only if the pattern match succeeds completely. Based on this information, consider the following example:

m/(d{3})-(d{3})-(d{4})/;
print "The area code is $1";  # Bad idea.  Assumes the match succeeded.

In the preceding example, $1 was used without making sure the pattern match worked. This will probably cause trouble if the match ever fails.

A New Function: grep

A common operation in Perl is to search arrays for patterns—for example, if you've read a file into an array and need to know which lines contain a particular word. Perl has one function in particular that you can use in this situation; it's called grep. The syntax for grep is as follows:

grep expression, list
grep block list
						

The grep function iterates through each element in list and then executes the expression or block. Within the expression or block,$_ is set to each element of the list being evaluated. If the expression returns true, the element is returned by grep. Consider this example:

@dogs=qw(greyhound bloodhound terrier mutt chihuahua);
@hounds=grep /hound/, @dogs;

In the preceding example, each element of @dogs is assigned, in turn, to $_. The expression /hound/ is then tested against $_. Each of the elements that returns true is returned by grep and stored in @hounds.

You need to remember two points here: First is that within the expression, $_ is a reference to the actual value in the list. Modifying $_ changes the original element in the list:

@hounds=grep s/hound/hounds/, @dogs;

After running this example, @hounds contains greyhounds and bloodhounds (note the s on the end). The original array @dogs is also modified—by way of changing $_—and it now contains greyhounds, bloodhounds, terrier, mutt, and chihuahua.

The other point to remember—Perl programmers forget this sometimes—is that grep isn't necessarily used with a pattern match or substitution operator; it can be used with any operator. The following example collects just the names of dogs longer than eight characters:

@longdogs=grep length($_)>8, @dogs;

Note

grep gets its name from a UNIX command by the same name that is used for searching for patterns in files. The UNIX grep command is so useful in UNIX (and hence, in Perl) that in the culture it has become a verb: "to grep." "To grep through a book" means to flip through the pages looking for a pattern.


A related function, map, has an identical syntax to grep, except that the return value from the expression (or block) is returned from map—not the value of $_. You use the map function to produce a second array based on the first. The following is an example:

@words= map { split ' ', $_ } @input;

In this example, each element of the array @input (passed to the block as $_) is split apart on spaces. This means that each element of @input produces a list of words; this list is stored in @words. Each consecutive line of @input is split apart and accumulates in @words.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.70.60