Native Perl Operators

The operators in this section are Perl operators not found in the C language.

Range Operator

The range operator is a pair of dots (..). It has many uses in Perl. Perhaps the most useful is to create a sequence of elements to fill an array. The code below sums the numbers in the range 1 to 100.

@numbers = (1..100); # the numbers 1, 2, 3,...,100
foreach $item (@numbers)
{
    $sum += $item;
}

Rather than explicitly writing out the values in a particular sequence, the range operator generates them for you. This operator can be used with numbers or with characters. The code below prints the uppercase and lowercase characters.

foreach $item (a..z, A..Z);
{
    print "$item";
}

String Operators

Perl is very good at string manipulation. There are many string operators and functions that we must discuss. We look at the operators first. The concatenation operator is the dot (.). If you have some strings and you must combine them, it is a routine matter. The program below reads lines from the standard input, exchanges the newline character for the visible sequence NL, and joins all the lines together. See the folder Glue.


% type glue.pl
#
#   glue.pl
#
while($line = <STDIN>)
{
        chomp($line);
        $tot = $tot . $line . "NL";

}
print length($tot), "
";
print "$tot
";           # print the long string.
% perl glue.pl
line 1
							line 2
							line 3
							^Z                        # control-z to end the input
24
line 1NLline 2NLline 3NL
%

There is also a replication operator, x, which is used to repeat a string a given number of times. Here is some code to demonstrate how this operator works. See the folder Repeat.


% type repeat.pl
#
#   repeat.pl
#
print "Enter a string ";
chomp($string = <STDIN>);
print "Enter a repeat factor ";
chomp($thismany = <STDIN>);
$result = $string x $thismany;
print "($string) repeated ($thismany) times is ($result)";
% perl repeat.pl
Enter a string hello
Enter a repeat factor 3
(hello) repeated (3) times is (hellohellohello)
%

Be careful when you use the repetition operator.

$number = 5;
$number = $number * 4          # number = 20
$number = $number x 4;         # number = 20202020

Table 3-2. Relational Operators
MeaningNumericString
Greater than>gt
Greater than or equal>=ge
Less than<lt
Less than or equal<=le
Equal==eq
Not equal!=ne
Signed equality<=>cmp

Relational Operators

We've already used some relational operators without giving any specific details. Relational operators have two different flavors: numeric and string. Table 3-2 summarizes which operators are used for string comparisons and which operators are used for numerical comparisons.

As you can see, the == operator is used for numerical equality, whereas eq is used for string equality. Be careful to use the correct operator here. The program below prints the square root and the square for input entered by the user. The user signals he or she is finished entering any input by entering the string end.

while(1)
{
        print "input a number ";
        chomp($val = <STDIN>);
        last if ( $val == end );
        print "$val ", $val * $val, " ", sqrt($val), "
";
}

In the above code, it is important to use eq rather than == in checking for the ending string. If you erroneously use ==, then an input of 0 will match favorably against end, which is not what you want.

cmp and <=> are similar in nature to the C language strcmp function in that they return –1 if the left operand is lower than the right operand, 1 if the opposite condition is true, and 0 if the operands are equal.

Logical Operators

Logical operators are those that operate on compound conditions. In Perl, logical operators behave as short-circuit operators; that is, they stop evaluating when they have determined the value of the condition. The logical operators are

Logical not          !
Logical and          &&
Logical or           ||

You may also use the English words not, and, and or as respective synonyms for the above operators.

Short-circuit operators add efficiency to programs. For example, if $x has the value 0, then it would be inefficient to evaluate the second and third comparisons of the compound condition below.

if ( $x == 0 || $y == 0 || $z == 0)
{
# some code here
}

Sometimes this can make a difference in the flow of data through your program. For example, in the code below, a line is read only if $x does not have the value 0.

if ($x == 0 || ($y = <STDIN>) )
{
# some code here
}

In this particular example, the inner parentheses are necessary. To understand why this is the case, first notice that there are three operators in the evaluated condition. Without the inner parentheses, they will be evaluated in the order ==, ||, and =, as if the condition had been parenthesized as

if ((($a == 0) || $b ) = <STDIN> )

The evaluation of the || is either true or false, thus generating a 1 or a 0. In either case, this constant value cannot be the recipient of the line input with <STDIN>.

Here are a few examples of the use of the logical operators.

print "$a is between 0 and 9" if ( $a >= 0 && $a <= 9);
print "$a is between 0 and 9" if ( $a >= 0 and $a <= 9);

print  "$a is not a digit" if ( ! ($a >= 0 && $a <= 9));
print  "$a is not a digit" if ( not ($a >= 0 && $a <= 9));

print  "$a is zero or one" if ( $a == 0 || $a == 1);
print  "$a is zero or one" if ( $a == 0 or $a == 1);

Regular Expression Operators

A regular expression is a set of metacharacters that form patterns that are used to match strings. Every string either matches a particular regular expression or it doesn't. Regular expressions are enclosed within a pair of slashes. For example, if you wanted to determine whether the variable $name matched the pattern tele, you would code as follows.

print "$name matches tele
" if ( $name =~ /tele/ );

All matches are case sensitive. If you wish to make the matches case insensitive, place an i in back of the pattern.

print "$name matches tele
" if ( $name =~ /tele/i );

You can also use a set of #'s or a set of !'s to delimit the pattern. In these cases, the match operator m is required. Thus, the following three statements are different ways of encoding the same test.

print "$name matches tele
" if ( $name =~ /tele/i );
print "$name matches tele
" if ( $name =~ m!tele!i );
print "$name matches tele
" if ( $name =~ m#tele#i );

If you want to match against the special variable $_, then you can omit the =~ part of the test. Thus, you often see code such as

while(<STDIN>)             # read a line into $_
{
    print if /Susan/;      # print $_ if $_ contains Susan
}

In addition to simply finding matches, you can also make a replacement if a match is found. In this case, you must use the s operator. This operator requires a target regular expression and a replacement string. In each example below, if $name matches the string Mike, then it is replaced with Michael.

print $name if ( $name =~ s/Mike/Michael/i );
print $name id ( $name =~ s#Mike#Michael#i );
print $name id ( $name =~ s!Mike!Michael!i );

Remember that the sequence represents the tab character and the sequence represents the newline character. Thus, each of these sequences counts as a single character in the code above. If you wish to make the substitution in the string $_, you can code as follows.

print if ( s/Mike/Michael/i );

Regular Expression Metacharacters

In the above regular expression examples, simple matches were explained. Regular expression matching is usually more complicated than that and often involves characters with special meanings. For example, the ^ and $ characters are used to find matches at the beginning and at the end of a string respectively.

print  if ( /sub/ );  # print $_ if it contains sub
print  if ( /^sub/ ); # print $_ if it contains sub at
                      # beginning of line
print  if ( /sub$/ ); # print $_ is it contains sub at
                      # end of line
print  if ( /^sub$/ );# print $_ if it contains only sub
                      # (i.e same sub at beginning and end)

The ^ and the $ are referred to as anchors because they anchor the match to a particular place, i.e., the beginning or the end of a string.

Other special characters are referred to as single character matches because they match a single character. The dot (.) character is one of these. For example, in the pattern /th.s/, the dot matches any character. Thus the pattern might be expressed as “any four-letter pattern whose first, second, and fourth characters are t, h, and s respectively and whose third character is any single character.” Each of the following strings matches the pattern.

this, that, these, those, eleventh series

Note that in the example above, the dot was matched by the space character.

The brackets match any single character contained within the brackets. For example, the pattern /th[iae]s/ is similar to the previous pattern, but the third character is limited to either an i, a, or e. Bracketed expressions have some shorthand notations as well. For example, [0123456789] can be encoded as [0-9], and the lowercase alphabet can be encoded as [a-z]. Thus, the following expression matches a pattern whose first character is a capital A and whose next characters are any digit, a lowercase character, and a capital Z.

/A[0-9][a-z]Z/

Some characters are multicharacter matches. For example, the + character allows you to express the idea of “one or more” of something. In the example below, the pattern is one or more digits followed by one or more white space characters followed by one or more alphanumeric characters.

/[0-9]+[	
 ]+[a-zA-Z0-9]+/

In addition Perl allows the following.

w  Match a "word" character (alphanumeric plus "_")
W  Match a non-word character
s  Match a whitespace character
S  Match a non-whitespace character
d  Match a digit character
D  Match a non-digit character

Another important metacharacter is the ?. The question mark is used to signify an optional character: that is, zero or one of something. The regular expression below specifies the string must begin optionally (i.e., there can be zero or one) with a plus or a minus sign followed by any number of digits.

/^[+-]?d+$/

The * character means zero or more of something. It is often used in connection with the . character to mean “a long string.” For example, the following regular expression matches the longest string bounded by a digit on the left and a lowercase character on the right.

/[0-9].*[a-z]/

Remembered Patterns

Often, you will want to extract a portion of a match. For example, you may want to extract the area code if a string contains a telephone number. The following two patterns match a telephone number in the United States.

/ddd-ddd-ddddD/
/((ddd)-ddd-dddd)D/

In the second pattern, there are two sets of parentheses. The inner set surrounds the area code portion of the match, while the outer set surrounds the entire phone number. The parentheses do not count toward the match, but rather if there is a match, they tell Perl to remember that portion of the match. You can print the remembered patterns by using the special variables $1, $2, and so on. Thus, the following program prints area codes and phone numbers from any telephone numbers that are found. See the folder Tele.


% type tele.pl
#
#
#
while(<STDIN>)
{
        print "Area code for $1 is $2 
"
        if /((ddd)-ddd-dddd)D/;
}
% perl tele.pl
My phone number is 301-555-1212
Area code for 301-555-1212 is 301
What is yours
							Glad you asked.  Mine is 401-555-1213
Area code for 401-555-1213 is 401
%

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.138.223