
CS 460/594 Spring 1999 Ron Jurincie
(Edited by M. Berry)
Note: This tutorial is only intended to provide a very simplified
introduction to writing scripts in perl. Use the links provided
within this document, as well as the ones on the course homepage
to further your understanding.
Perl (Practical Extraction and Report Language) is a computer scripting
language developed by Larry Wall in 1986. Originally, Wall created perl
because he wanted a language which combined many of the best features of
sed, awk, sh and other scripting languages.
Wall immediately made his new scripting language freely available, allowing
other other programmers to use his language as they liked. Soon, many
programmers began sharing individually developed features with other perl
users. Some of these features found their way into later versions of perl,
and the current version (Perl 5.0) is a complete rewrite of earlier
versions.
Perl's original 15 man pages bear little resemblance to today's extensive
collection. In addition to the local man pages, you can also learn about
perl from online tutorials, newsgroups, mailing lists, and FAQ's.
Our man pages can be accessed as follows:
Unix> man perl
Perl's on-line man pages are available here.
Two lectures from an excellent on-line Perl tutorial written by K. Buckner
for CS291 are available: Lecture1, Lecture2.
|
Perl is an Interpreted Language |
Perl is an interpreted language. Unlike many compiled languages, perl
compiles its source code into a parse tree and executes it immediately.
This means that the development cycle of perl scripts is much quicker than
with compiled languages. However, the execution speeds of perl scripts
cannot compete with compiled object code.
Because of its rapid development time, perl is often used as a prototyping
language for large software projects. Perl allows programmers to develop
simplified versions of projects, which can later be converted into other
faster running compiled languages.
Perl uses sophisticated pattern-matching techniques to swiftly scan
large amounts of textual (or binary) data.
Perl can read and write to TCP/IP sockets.
Perl is free and readily available.
Perl help is easily available.
Perl is an excellent CGI scripting language.
Perl can be used to automate FTP file retrieval.
Perl has specialized extensions for handling Oracle and other popular
data bases.
Perl is relatively easy to learn, especially for programmers familiar with
C and C++. Much of perl's syntax is similar to that of C. Perl can make
simple programs much shorter, and easier to write. Take a look at C
code compared to perl code for the ubiquitous Hello World program.
C code Perl code
#include <stdio.h> print "Hello World\n";
main()
{
printf("Hello World\n");
}
See how easy simple perl scripts can be?
Now lets look at some perl syntax.
Comments are proceeded by the # character, and continue until the end
of the line. They can appear at the beginning of a line, or after perl code.
$J = 333; # assigns the value 33 to the scalar variable J
# this is a comment too
There are many ways to perform loops, and if / else operations. Refer to
perl's man pages for a complete list.
Perl supports C's for loop functionality, as well as the while statement,
along with a foreach structure. Here is an example of the foreach structure:
Perl programs can be executed by proceeding the filename with perl. To
execute our Hello World script one simply enters:
Unix> perl hello_world.perl
On our Unix system perl 5.0 is located at /usr/local/bin/perl5 .
Perl can be made to issue warnings about possible execution problems by
using the -w tag before the filename on the command line.
example:
Unix> perl -w hello_world.perl
Similarly, the -d tag can be used to activate the perl debugger which
allows for step by step running of the program with stop points and value
checking.
Perl's 3 basic variable types are scalars, arrays and hashes. Unlike
C, perl variables do not need to be declared prior to their use.
Scalars
Scalar variables are perl's most primitive variable type. Scalars can hold
numbers, strings of characters or even strings of numbers, and these values
are all completely interchangeable. Scalar variables are proceeded by the
$ character.
Numbers are represented as either unsigned integers, or double precision
floating point numbers, depending on context.
Here are some number assignment examples:
$a = 4;
$count = 0;
$val = 1.22345;
Strings are usually delimited by single or double quotes. Double
quote delimited strings are subject to backslash and variable
interpolation, while single quote delimited strings are not. The
most common backslashed characters are \n for a newline, and \t
for a tab (look familiar?). Refer to perl's man pages for more
info.
Here are some string assignment examples:
$a = "hello";
$letter = 'a';
$name = "Jones";
Comparisons
Perl has different ways of comparing strings and numerics.
Function Strings Numerics
equal eq ==
not equal ne !=
less than lt <
greater then gt >
less than or equal to le <=
greater than or equal to ge >=
comparison with signed cmp <=>
result
Arrays
Arrays are ordered groups of variables. They are always proceeded by
the @ character. Arrays are composed of comma separated values
surrounded by parenthesis.
Here are some array assignment examples:
@teachers = ("Vose", "Berry", "Vander Zanden");
@letters = (a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z);
@percentages = (33.5555,55.0,11,23.44);
Array values are accessed via the $ operator and identified by an integer
value surrounded by brackets [ ]. As in the C programming language, Arrays
indices always begin at 0.
To access the letter c in the array @letters you would refer to $letters[2].
To determine the number of elements in an array, you assign the array to a
scalar value as seen below:
$count = @letters;
The scalar count receives the value 26, which is the number of elements in
the array letters.
Perl provides built-in push and pop functions which operate on
arrays, treating them as stacks.
Using our array examples above, observe these uses of push and pop:
push(@teachers,"Dongarra"); # adds the string to the end of @teachers
# Now @teachers = {"vose","berry",
# "Vander Zanden","Dongarra"}
$last = pop(@teachers); # $last = "Dongarra"
# Now @teachers = {"vose","berry","Vander Zanden"}
Hashes
Hashes are unordered sets of key/value pairs. They are always
proceeded by the % character. Values are assigned with the key
with either a comma , or an arrow => . Pairs are separated
by commas only. Entire hash assignments surrounded by parenthesis.
Here are some hash assignment examples:
%grades = ('Mackey',A , 'Frost',B+, 'Jhonston',C , 'Toms',A);
%jersey_numbers = (Ripken => 8, Hayes => 22, Ruth => 3);
One way to access a value from its key is shown below:
$number = $numbers{"Ripken"}; # $number = 8
Perl uses filehandles to control input and output. Perl has 3 built-in
filehandles which are opened automatically for each program. These are
STDIN, STDOUT, and STDERR.
Additional filehandles are created by the open command.
open(DATA, "myfile.text"); # opens myfile.text for reading
# with the filehandle DATA
open(OUT,">myfile.text"); # opens myfile.text for writing
# with the filehandle OUT
open(OUT2,">>myfile.text"); # opens myfile.text for appending
# with the filehandle OUT2
Open returns a true if the file is successfully opened, and false on
failure.
To access files, surround the filehandle with the diamond operator: <DATA>
See perltest1. Now notice how perltest2 accomplishes the same results.
In perltest2 note how $_ is used as a default variable.
Perl provides the close command to deallocate filehandles as seen below:
close(OUT2); # closes filehandle OUT2
|
Text Processing with Perl |
Perl provides a great deal of built-in text-processing functions. We will
cover only some of the most popular such functions. Once again refer to
perl's man pages for further info.
Pattern Matching
Perl uses forward slashes to delimit regular expressions for pattern
matching and substitution. Strings are evaluated to true of false via
the =~ operator.
$a = "Mary had a litle lamb";
$a =~/little/ # evaluates to true
$a =~/brittle/ # evaluates to false
Perl provides a set of modifying characters for string matching, some of
these are shown below:
Modifier Meaning
i matches characters regardless of case
g matches characters globally
s treats string as a single line
Perl uses a set of meta-characters to extend the functionality of pattern
matching. Below is a table of commonly used meta characters.
Metacharacter Meaning
. matches any single character except for \n
^ matches strings which occur at the front of a line
$ matches strings which occur at the end of a line
* matches proceeded character 0 or more times
+ matches proceeded character 1 or more times
? matches proceeding character 0 or 1 times
[...] matches any of the class of characters
Perl also has a set of special characters proceeded with a backslash,
some of which are listed below.
Special Character Meaning
\s any whitespace
\S any non whitespace
\d any digit i.e. [0-9]
\w any alphanumeric i.e [^a-zA-Z0-9]
\n a newline
\t a tab
Substitution
Perl provides a simple way of searching for patterns and substituting
new patterns in their place. This accomplished by using a s before
a slash delimited regular expression. This is an extremely powerful text
processing tool.
s/string1/string2/i # replaces the first instance of string1 with
# string 2
# the /i forces a case sensitive search
See the following files for examples:
example1 reads from <STDIN> and writes each line in reverse order to
<STDOUT>
example1A places lines from <STDIN> into a buffer and prints those lines
in reverse order after reading an EOF <CONTROL d>
example2 reads from <STDIN> and writes each line in reverse order to the
file revline.txt
example3 sorts an array and prints its sorted elements to STDOUT
example4 reads a series of strings from <STDIN> and replaces each
occurrence of the string "[tT][iI][mM][eE]" with the string
"money"
Please submit the following exercises to your TA (by e-mail) no later than
5pm on Tuesday, Feb. 16: Exercise 1, Exercise 2.