11.2 Perl

Perl is an acronym for Practical Extraction and Report Language. Perl is a general-purpose programming language originally developed for text manipulation and now used for a wide range of tasks including system administration, web development, network programming, GUI development, etc. It borrows several ideas from natural languages:

  • Learn it once, use it many times
  • Learn as you go
  • Many acceptable levels of competence
  • Multiple ways to say the same thing
  • Style not enforced
  • Cooperative design

The language is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant, minimal). Its major features are that it is easy to use, supports both procedural and object-oriented (OO) programming, has powerful built-in support for text processing, and has one of the world's most impressive collections of third-party modules. Perl is highly portable, it runs on Windows, Linux, Mac OS, X, Solaris, *BSD, UN*X and others.

Here our intention is not to teach Perl language, but discuss some of its properties relevant to its implementation.

A Perl script or program consists of one or more statements. These statements are simply written in the script in a straightforward fashion. There is no need to have a main () function or anything of that kind.

The “Hello World” program is simply:

#!/usr/bin/perl
print ″Hello, world″;

Here the first line, which is a special comment called “shebang” makes this little program directly executable as a command on Unix-like systems.

Perl is usually included as a standard component in present-day distributions of Unix-like operating systems. There is extensive documentation available for Perl, starting with perltoc , which is usually read via perldoc , which has ability to search out Perl documentation on an installation.

Perl looks rather odd on first sight. But once you have learned Perl, you will be able to write programs in Perl much faster than the other languages. This makes Perl especially suitable for writing programs which are used once only.

The following simple Perl program reads a text file from STDIN, consisting of people's names like “John Smith”, each on a line of its own, and it prints them out with the first and second names have been swapped and separated with a comma (e.g. “Smith, John”).

while(<>) {
  split;
  print ″$_[1], $_[0]
″; }

Compare it with a C program to do the same job.

Perl is typically implemented as partially compiled and part interpreted language. Thus, the execution of a Perl program tends to be slower than a corresponding C program. On the other hand, computers tend to get faster and faster, and writing something in Perl instead of C tends to save programmer time. Recent versions of Perl execute programs at speeds comparable to corresponding C programs.

We shall extend the “Hello World” Perl program a bit:

#!/usr/bin/perl
print ″What is your name: ″;
$name = <STDIN>;
chomp($name);
print ″Hello, $name!
″;

$name is a scalar variable, i.e. it holds one value. The “dollar” sign in front indicates that this is a scalar variable. <STDIN> reads one line from the standard input. chomp is a function which strips last character if it is a , for example “John ” is converted to “John”.

Note the variable in the print statement, where Perl inserts its value. This is called variable interpolation.

Perl has three types only:

Scalar: Those variables holding a single value – be it an integer, floating point, a character, a string or a reference.

Array: Denoted by the variable name prefixed by @ symbol, for example @lines , is an array or list of scalars. Individual elements of the array can be accessed by usual indexing (zero based), for example $lines[5] accesses the 6th element of @lines . A number of useful functions are available for using the arrays as stacks or queues.

Hash: Denoted by the variable name prefixed by a % symbol, for example, %pairs , is an associative array of (Key ⇒ Value) pairs.

The variables need not be declared before their use, though it is possible to use strict pragma for ensuring that variables are declared before use. Similarly, use warnings will force variables initialization before use. The memory for newly created entities is allocated automatically and a garbage collector works in background for harvesting unused memory.

The variable lexical scope can be declared by my construct, which will limit a variable scope to the containing block.

Subroutines in Perl are named code blocks, with syntax sub nameOfSub{} , with a special variable @_ as input arguments array. The function return() allows you to return values, which could be a Scalar, an Array or a Hash.

An important concept in Perl is context, which controls interpretation of various entities. Basically there are two contexts – scalar and list, which makes it unique and more useful than most other script languages. For example, suppose we have used an array @lines , then if we write a statement $nline = @lines , we have a scalar context, forced by the LHS of the assignment. An array gives the number of elements in a scalar context which means $nline will be assigned the size of the array @lines . On the other hand, if we had a statement ($e1, $e2, $e3) = @ lines; , then we have a list context and an array gives the elements itself in a list context, which means that $e1, $e2, $e3 will be assigned the values respectively of the first three elements of @lines .

Conversion from one scalar type (integer, string, float) to another happens automatically.

The arithmetic and relational operators are similar to C language, with additional operators for strings. There is a very rich set of string manipulation functions and regular expressions.

Perl has a rich set of control constructs, comparable and going beyond usual programming languages. For example, the usual IF construct can be written in two ways: if(condition) { command } and command if condition ;, which makes a program easy to understand. The elements of an array or hash can be traversed very simply as:

for $info(@data){
    print $info,″
″;
}

which successively assigns values of elements of the array @data to $info and the print statement prints them. Similarly, keys and values in a hash can be accessed as:

for $key(sort keys %data){
    print ″$key $data{$key}
″; 
}

which extracts the keys of the hash, sorts them and then assigns their values successively to $key. The print statement prints each key and corresponding value in the hash, specified by $data{$key} .

There are even more compact functions available in Perl for processing elements of an array, for example,

@files = map {lc($_)} <*.java>;

will convert name of each Java source file in the current directory to lowercase. The curly braces may contain arbitrary operations to be performed on each element of the list (in example a list of Java source files.)

11.2.1 Perl Internals

Perl is not exactly an interpreter and it is not exactly a compiler: it is a byte-code compiler. It first compiles the input source code to an internal representation or byte-code, and then it executes the operations that the byte-code specifies on a virtual machine.

Comparing with Java's virtual machine, it is designed to represent an idealized version of a computer's processor. In Perl's case, however, the individual operations that can be performed are considerably higher level. For instance, a regular expression match is a single “instruction” in Perl's virtual machine. Perl uses a stack to co-ordinate and communicate results between operations.

The first stage is “parsing”, the input to which is your Perl source code and the output is a tree data structure which represents what that code “means”. One of the nodes in this tree is designated as the start node. Every node will have an operation to perform and contains a pointer to the node that the interpreter must execute next. Thus, the second phase of the operation is to execute the start node and follow the chain of pointers around the tree, executing each operation in the correct order.

Internal Data Types

As the tree of opcodes constituting a compiled Perl program is executed, Perl values are created, manipulated and destroyed. All the data types in Perl have corresponding data types in the C under Perl's hood.

Three C typedefs correspond to Perl's three basic data types and there are some additional types:

SV (scalar value)

AV (array value)

HV (hash value)

IV is a simple signed integer type guaranteed to be large enough to hold either a pointer or an integer.

I32 and I16 are types guaranteed to be large enough to hold 32- and 16-bits, respectively.

UV, U32 and U16 typedefs for storing unsigned versions of these last three typedefs.

All of these typedefs can be manipulated with the C functions described in the perlguts documentation. The behaviour of some of those functions is briefly discussed below.

  • There are four types of values that can be copied into an SV: an integer value (IV), a double (NV), a string (PV) and another scalar (SV). There are dozens of functions for SVs to let you create, modify, grow and check for the truth or definedness of the Perl scalars they represent. Perl references are implemented as an RV, a special type of SV.
  • When an AV is created, it can be created empty or populated with SVs, which makes sense since an array is a collection of scalars.
  • The HV has associated C functions for storing, fetching, deleting and checking the existence of key/value pairs in the hash the HV represents.
  • There is also a GV (glob value), which can hold references to any of the values associated with a variable identifier: a scalar value, an array value, a hash value, a subroutine, an I/O handle or a format.

For further details, see perlguts documentation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.136.142