home *** CD-ROM | disk | FTP | other *** search
Text File | 1990-03-10 | 23.2 KB | 1,028 lines |
-
-
- PERL
-
-
- a language by Larry Wall
-
-
- Practical Extraction and Report Language
-
-
- or
-
-
- Pathologically Eclectic Rubbish Lister
-
-
- Tom Christiansen
- CONVEX Computer Corporation
-
-
- ---------------------------------------
-
- Overview
-
-
- o What is Perl: features, where to get it, preview
-
-
- o Data Types: scalars and arrays
-
-
- o Operators
-
-
- o Flow Control
-
-
- o Regular Expressions
-
-
- o I/O: regular I/O, system functions, directory access, for-
- matted I/O
-
-
- o Functions and Subroutines: built-in array and string func-
-
- tions
-
-
- o Esoterica: suid scripts, debugging, packages, command line
- options
-
-
- o Examples
-
-
- ---------------------------------------
-
- What is Perl?
-
-
- o An interpreted language that looks a lot like C with built-
- in sed, awk, and sh, as well as bits of csh, Pascal, FOR-
-
- TRAN, BASIC-PLUS thrown in.
-
-
- o Highly optimized for manipulating printable text, but also
- able to handle binary data.
-
-
- o Especially suitable for system management tasks due to in-
-
- terfaces to most common system calls.
-
-
- o Rich enough for most general programming tasks.
-
-
- o "A shell for C programmers." [Larry Wall]
-
- ---------------------------------------
-
- Features
-
-
- o Easy to learn because much derives from existing tools.
-
-
- o More rapid program development because it's an interpreter
-
-
- o Faster execution than shell script equivalents.
-
-
- o More powerful than sed, awk, or sh; a2p and s2p translators
- supplied for your old scripts.
-
-
- o Portable across many different architectures.
-
-
- o Absence of arbitrary limits like string length.
-
-
- o Fits nicely into UNIX tool and filter philosophy.
-
-
- o It's free!
-
-
- ---------------------------------------
-
- Where to get it
-
-
- o Any comp.sources.unix archive
-
-
- o Famous archive servers
- o uunet.uu.net 192.48.96.2
-
- o tut.cis.ohio-state.edu 128.146.8.60
-
-
- o Its author, Larry Wall <lwall@jpl-devvax.jpl.nasa.gov>
- o jpl-devvax.jpl.nasa.gov 128.149.1.143
-
-
- o Perl reference guide (in postscript form) also available
-
- from Ohio State, along with some sample scripts and archives
- of the perl-users mailing list.
-
-
- o USENET newsgroup comp.lang.perl good source for questions,
-
- comments, examples.
-
- ---------------------------------------
-
- Preview
-
-
- o It's not for nothing that perl is sometimes called the
- "pathologically eclectic rubbish lister." Before you drown
-
- in a deluge of features, here's a simple example to whet
- your appetites that demonstrates the principal features of
-
- the language, all of which have been present since version
- 1.
-
-
- while (<>) {
-
- next if /^#/;
- ($x, $y, $z) = /(\S+)\s+(\d\d\d)\s+(foo|bar)/;
-
- $x =~ tr/a-z/A-Z/;
- $seen{$x}++;
-
- $z =~ s/foo/fear/ && $scared++;
- printf "%s %08x %-10s\n", $z, $y, $x
-
- if $seen{$x} > $y;
- }
-
-
- ---------------------------------------
-
- Data Types
-
-
- o Basic data types are scalars, indexed arrays of scalars, and
- associative arrays of scalars.
-
-
- o Scalars themselves are either string, numeric, or boolean,
-
- depending on context. Values of 0 (zero) and '' (null
- string) are false; all else is true.
-
-
- o Type of variable determined by leading special character.
-
- o $ scalar
- o @ indexed array (lists)
-
- o % associative array
- o & function
-
-
- o All data types have their own separate namespaces, as do la-
-
- bels, functions, and file and directory handles.
-
- ---------------------------------------
-
- Data Types (scalars)
-
-
- o Use a $ to indicate a scalar value
-
-
- $foo = 3.14159;
-
-
- $foo = 'red';
-
-
- $foo = "was $foo before"; # interpolate variable
-
-
- $host = `hostname`; # note backticks
-
-
- ($foo, $bar, $glarch) = ('red', 'blue', 'green');
-
-
- ($foo, $bar) = ($bar, $foo); # exchange
-
- ---------------------------------------
-
- Special Scalar Variables
-
-
- o Special scalars are named with punctuation (except $0). Ex-
- amples are
-
-
- o $0 name of the currently executing script
-
- o $_ default for pattern operators and implicit I/O
- o $$ the current pid
-
- o $! the current system error message from errno
- o $? status of last `backtick`, pipe, or system
-
- o $| whether output is buffered
- o $. the current line number of last input
-
- o $[ array base, 0 by default; awk uses 1
- o $< the real uid of the process
-
- o $( the real gid of the process
- o $> the effective uid of the process
-
- o $) the effective gid of the process
-
- ---------------------------------------
-
- Data types (arrays)
-
-
- o Indexed arrays (lists); $ for one scalar element, @ for all
- $foo[$i+2] = 3; # set one element to 3
-
- @foo = ( 1, 3, 5 );# init whole array
- @foo = ( ) ; # initialize empty array
-
- @foo = @bar; # copy whole @array
- @foo = @bar[$i..$i+5]; # copy slice of @array
-
-
- o $#ARRAY is index of highest subscript, so the script's name
-
- is $0 and its arguments run from $ARGV[0] through
- $ARGV[$#ARGV], inclusive.
-
-
- o Associative (hashed) arrays; $ for one scalar element, % for
-
- all
- $frogs{'green'} += 23; # 23 more green frogs
-
- $location{$x, $y, $z} = 'troll'; # multi-dim array
- %foo = %bar; # copy whole %array
-
- @frogs{'green', 'blue', 'yellow'} = (3, 6, 9);
-
- ---------------------------------------
-
- Special Array Variables
-
-
- o @ARGV command line arguments
-
-
- o @INC search path for files called with do
-
-
- o @_ default for split and subroutine parameters
-
-
- o %ENV the current enviroment; e.g. $ENV{'HOME'}
-
-
- o %SIG used to set signal handlers
-
-
- sub trapped {
- print STDERR "Interrupted\007\n";
-
- exit 1;
- }
-
- $SIG{'INT'} = 'trapped';
-
- ---------------------------------------
-
- Operators
-
-
- Perl uses all of C's operators except for type casting and `&'
- and `*' as address operators, plus these
-
-
- o exponentiation: **, **=
-
-
- o range operator: ..
-
- $inheader = 1 if /^From / .. /^$/;
- if (1..10) { do foo(); }
-
- for $i (60..75) { do foo($i); }
- @new = @old[30..50];
-
-
- o string concatenation: ., .=
-
-
- $x = $y . &frob(@list) . $z;
-
- $x .= "\n";
-
- ---------------------------------------
-
- Operators (continued)
-
-
- o string repetition: x, x=
-
-
- $bar = '-' x 72; # row of 72 dashes
-
-
- o string tests: eq, ne, lt, gt, le, ge
-
-
- if ($x eq 'foo') { }
- if ($x ge 'red' ) { }
-
-
- o file test operators like augmented /bin/test tests work on
-
- strings or filehandles
-
-
- if (-e $file) { } # file exists
- if (-z $file) { } # zero length
-
- if (-O LOG) { } # LOG owned by real uid
- die "$file not a text file" unless -T $file;
-
-
- ---------------------------------------
-
- Flow Control
-
-
- o Unlike C, blocks always require enclosing braces {}
-
-
- o unless and until are just if and while negated
-
-
- o if (EXPR) BLOCK else BLOCK
- o if (EXPR) BLOCK elsif (EXPR) BLOCK else BLOCK
-
- o while (EXPR) BLOCK
- o do BLOCK while EXPR
-
- o for (EXPR; EXPR; EXPR) BLOCK
- o foreach $VAR (LIST) BLOCK
-
-
- o For readability, if, unless, while, and until may be used as
-
- trailing statement modifiers as in BASIC-PLUS
-
-
- return -1 unless $x > 0;
-
- ---------------------------------------
-
- Flow Control (continued)
-
-
- o Use next and last rather than C's continue and break
-
-
- o redo restarts the current iteration, ignoring the loop test
-
-
- o Blocks (and next, last, and redo) take optional labels for
- clearer loop control, avoiding the use of goto to exit nest-
-
- ed loops.
-
-
- o No switch statement, but it's easy to roll your own
-
-
- o do takes 3 forms
- o execute a block
-
- do { $x += $a[$i++] } until $i > $j;
- o execute a subroutine
-
- do foo($x, $y);
- o execute a file in current context
-
- do 'subroutines.pl';
-
- ---------------------------------------
-
- Regular Expressions
-
-
- o Understands egrep regexps, plus
-
-
- o \w, \W alphanumerics plus _ (and negation)
- o \d, \D digits (and negation)
-
- o \s, \S white space (and negation)
- o \b, \B word boundaries (and negation)
-
-
- o C-style escapes recognized, like \t, \n, \034
-
-
- o Don't escape these characters for their special mean-
-
- ing: ( ) | { } +
-
-
- o Character classes may contain metas, e.g. [\w.$]
-
-
- o Special variables: $& means all text matched, $` is text be-
- fore match, $' is text after match.
-
-
- ---------------------------------------
-
- Regular Expressions (continued)
-
-
- o Use \1 .. \9 within rexprs; $1 .. $9 outside
-
-
- if (/^this (red|blue|green) (bat|ball) is \1/)
- { ($color, $object) = ($1, $2); }
-
- ($color, $object) =
- /^this (red|blue|green) (bat|ball) is \1/;
-
-
- o Substitute and translation operators are like sed's s and y.
-
- s/alpha/beta/;
- s/(.)\1/$1/g;
-
- y/A-Z/a-z/;
-
-
- o Use =~ and !~ to match against variables
-
-
- if ($foo !~ /^\w+$/) { exit 1; }
- $foo =~ s/\btexas\b/TX/i;
-
-
- ---------------------------------------
-
- I/O
-
-
- o Filehandles have their own distinct namespaces, but are typ-
- ically all upper case for clarity. Pre-defined filehandles
-
- are STDIN, STDOUT, STDERR.
-
-
- o Mentioning a filehandle in angle brackets reads next line in
- scalar context, all lines in an array context; newlines are
-
- left intact.
-
-
- $line = <TEMP>;
- @lines = <TEMP>;
-
-
- o <> means all files supplied on command line (or STDIN if
-
- none). When used this way, $ARGV is the current filename.
-
-
- o When used in a while construct, input lines are automatical-
- ly assigned to the $_ variable.
-
-
- ---------------------------------------
-
- I/O (continued)
-
-
- o Usually iterate over file a line at a time, assigning to $_
- each time and using that as the default operand.
-
-
- while ( <> ) {
-
- next if /^#/; # skip comments
- s/left/right/g; # global substitute
-
- print; # print $_
- }
-
-
- o If not using the pseudo-file <>, open a filehandle:
-
-
- open (PWD, "/etc/passwd");
-
- open (TMP, ">/tmp/foobar.$$");
- open (LOG, ">>logfile");
-
- open (TOPIPE, "| lpr");
- open (FROMPIPE, "/usr/etc/netstat -a |");
-
-
- ---------------------------------------
-
- I/O (continued)
-
-
- o May also use getc for character I/O and read for raw I/O
-
-
- o Access to eof, seek, close, flock, ioctl, fcntl, and select
- calls for use with filehandles.
-
-
- o Access to mkdir, rmdir, chmod, chown, link, symlink (if sup-
-
- ported), stat, rename, unlink calls for use with filenames.
-
-
- o Pass printf a filehandle as its first argument unless print-
- ing to STDOUT
-
-
- printf LOG "%-8s %s: weird bits: %08x\n",
-
- $program, &ctime, $bits;
-
-
- o Associative arrays may be bound to dbm files with dbmopen()
-
- ---------------------------------------
-
- System Functions
-
-
- A plethora of functions from the C library are provided as
- built-ins, including most system calls. These include
-
-
- o chdir, chroot, exec, exit, fork, getlogin, getpgrp, getppid,
-
- kill, setpgrp, setpriority, sleep, syscall, system, times,
- umask, wait.
-
-
- o If your system has Berkeley-style networking, bind, connect,
-
- send, getsockname, getsockopt, getpeername, recv, listen,
- socket, socketpair.
-
-
- o getpw*, getgr*, gethost*, getnet*, getserv*, and getproto*.
-
-
- o pack and unpack can be used for manipulating binary data.
-
-
- ---------------------------------------
-
- Directory Access
-
-
- Three methods of accessing directories are provided.
-
-
- o You may open a pipe from /bin/ls like this:
- open(FILES,"/bin/ls *.c |");
-
- while ($file = <FILES>) { chop($file); ... }
-
-
- o The directory-reading routines are provided as built-ins and
- operate on directory handles. Supported routines are open-
-
- dir, readdir, closedir, seekdir, telldir, and rewinddir.
-
-
- o The easiest way is to use perl's file globbing notation. A
- string enclosed in angle brackets containing shell meta-
-
- characters evaluates to a list of matching filenames.
-
-
- foreach $x ( <*.[ch]> ) { rename($x, "$x.old"); }
- chmod 0644, <*.c>;
-
-
- ---------------------------------------
-
- Subroutines
-
-
- o Subroutines called either with `do' operator or with `&'.
- Any of the three principal data types may be passed as
-
- parameters or used as a return value.
-
-
- do foo(1.43);
-
-
- do foo(@list)
-
-
- $x = &foo('red', 3, @others);
-
-
- @list = &foo(@olist);
-
-
- %foo = &foo($foo, @foo);
-
- ---------------------------------------
-
- Subroutines (continued)
-
-
- o Parameters are received by the subroutine in the special ar-
- ray @_. If desired, these can be copied to local variables.
-
- This is especially useful for recursive subroutines.
-
-
- $result = &simple($alpha, $beta, @tutti);
- sub simple {
-
- local($x, $y, @rest) = @_;
- local($sum, %seen);
-
- return $sum;
- }
-
-
- o Subroutines may also be called indirectly
-
-
- $foo = 'some_routine';
-
- do $foo(@list)
- ($x, $y, $z) = do $foo(%maps);
-
-
- ---------------------------------------
-
- Formatted I/O
-
-
- o Besides printf, formatted I/O can be done with format and
- write statements.
-
-
- o Automatic pagination and printing of headers.
-
-
- o Picture description facilitates lining up multi-line output
-
-
- o Fields in picture may be left or right-justified or centered
-
-
- o Multi-line text-block filling is provided, something like
-
- having a %s format string with a built-in pipe to fmt)
-
-
- o These special scalar variables are useful:
- o $% for current page number,
-
- o $= for current page length (default 60)
- o $- for lines left on page
-
-
- ---------------------------------------
-
- Formatted I/O (example)
-
- # a report from a bug report form; taken from perl man page
- format top =
- Bug Reports
- @<<<<<<<<<<<<<<<<<<<<<<< @||| @>>>>>>>>>>>>>>>>>>>>>>>
- $system, $%, $date
- ------------------------------------------------------------------
- .
-
- format STDOUT =
- Subject: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
- $subject
- Index: @<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
- $index, $description
- Priority: @<<<<<<<<<< Date: @<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
- $priority, $date, $description
- From: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
- $from, $description
- Assigned to: @<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
- $programmer, $description
- ~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
- $description
- ~ ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
- $description
- ~ ^<<<<<<<<<<<<<<<<<<<<<<<...
- $description
- .
-
-
- ---------------------------------------
-
- Built-in Array Functions
-
-
- o Indexed arrays function as lists; you can add items to or
- remove them from either end using these functions:
-
- o pop remove last value from end of array
- o push add values to end of array
-
- o shift remove first value from front of array
- o unshift add values to front of array
-
-
- For example
-
-
- push(@list, $bar);
-
- push(@list, @rest);
- $tos = pop(@list);
-
- while ( $arg = shift(@ARGV) ) { }
- unshift( @ARGV, 'zeroth arg', 'first arg');
-
-
- ---------------------------------------
-
- Built-in Array Functions (split and join)
-
-
- o split breaks up a string into an array of new strings. You
- can split on arbitrary regular expressions, limit the number
-
- of fields you split into, and save the delimiters if you
- want.
-
-
- @list = split(/[, \t]+/, $expr);
-
- while (<PASSWD>) {
- ($login, $passwd, $uid, $gid, $gcos,
-
- $home, $shell) = split(/:/);
- }
-
-
- o The inverse of split is join.
-
-
- $line = join(':', $login, $passwd, $uid,
-
- $gid, $gcos, $home, $shell);
-
- ---------------------------------------
-
- Built-in Array Functions (sort, grep, reverse)
-
-
- o reverse inverts a list.
-
-
- foreach $tick (reverse 0..10) { }
-
-
- o sort returns a new array with the elements ordered according
- to their ASCII values. Use your own routine for different
-
- collating.
-
-
- print sort @list;
- sub numerically { $a - $b; }
-
- print sort numerically @list;
-
-
- o grep returns a new list consisting of all the elements for
- which a given expression is true. For example, this will
-
- delete all lines with leading pound signs:
-
-
- @lines = grep(!/^#/, @lines);
-
- ---------------------------------------
-
- Built-in Array Functions (%arrays)
-
-
- For manipulating associative arrays, the keys and values func-
- tions return indexed arrays of the indices and data values
-
- respectively. each is used to iterate through an associative ar-
- ray to retrieve one ($key,$value) pair at a time.
-
-
- while (($key,$value) = each %array) {
-
- printf "%s is %s\n", $key, $value;
- }
-
-
- foreach $key (keys %array) {
-
- printf "%s is %s\n", $key, $array{$key};
- }
-
-
- print reverse sort values %array;
-
-
- ---------------------------------------
-
- String functions
-
-
- o Besides the powerful regular expression features, several
- well-known C string manipulation functions are provided, in-
-
- cluding crypt, index, rindex, length, substr, and sprintf.
-
-
- o The chop function efficiently removes the last character
- from a string. It's usually used to delete the trailing
-
- newline on input lines. Like many perl operators, it works
- on $_ if no operand is given.
-
-
- chop($line);
-
- chop ($host = `hostname`);
- while (<STDIN>) {
-
- chop; ...
- }
-
-
- ---------------------------------------
-
- String functions (continued)
-
-
- o The eval operator lets you execute dynamically generated
- code. For example, to process any command line arguments of
-
- the form variable=value, place this at the top of your
- script:
-
-
- eval '$'.$1."'$2';"
-
- while $ARGV[0] =~ /^([A-Za-z_]+=)(.*)/ && shift;
-
-
- The eval operator is also useful for run-time testing of
- system-dependent features which would otherwise trigger fa-
-
- tal errors. For example, not all systems support the sym-
- link or dbmopen; you could test for their existence by exe-
-
- cuting the statements within an eval and testing the special
- variable $@, which contains the text of the run-time error
-
- message if anything went wrong.
-
- ---------------------------------------
-
- Suid Scripts
-
-
- o Perl programs can be made to run setuid, and can actually be
- more secure than the corresponding C program.
-
-
- o Because interpreters have no guarantee that the filename
-
- they get as the first argument is the same file that was
- exec'ed, perl won't let your run a setuid script on a system
-
- where setuid scripts are not disabled.
-
-
- o Using a dataflow tracing mechanism triggered by setuid exe-
- cution, perl can tell what data is safe to use and what data
-
- comes from an external source and thus is "tainted."
-
-
- o Tainted data may not be used directly or indirectly in any
- command that modifies files, directories or processes or
-
- else a fatal run-time error will result.
-
- ---------------------------------------
-
- Debugging and Packages
-
-
- o When invoked with the -d switch, perl runs your program
- under a symbolic debugger (written in perl) somewhat similar
-
- to sdb in syntax. Amongst other things, breakpoints may be
- set, variables examined or changed, and call tracebacks
-
- printed out. Because it uses eval on your code, you can ex-
- ecute any arbitrary perl code you want from the debugger.
-
-
- o Using packages you can write modules with separate
-
- namespaces to avoid naming conflicts in library routines.
- The debugger uses this to keep its variables separate from
-
- yours. Variable are accessed by the package'name notation,
- as in this line from the debugger:
-
-
- $DB'stop[$DB'line] =~ s/;9$//;
-
-
- ---------------------------------------
-
- Command Line Options
-
-
- The following are the more important command line switches recog-
- nized by perl:
-
-
- o -v print out version string
-
- o -w issue warnings about error-prone constructs
- o -d run script under the debugger
-
- o -e like sed: used to enter single command lines
- o -n loop around input like sed -n
-
- o -p as with -n but print out each line
- o -i edit files in place
-
- o -a turn on autosplit mode (like awk) into @F array
- o -P call C pre-processor on script
-
-
- ---------------------------------------
-
- Examples: Command Line
-
-
- # output current version
- perl -v
-
-
- # simplest perl program
-
- perl -e 'print "hello, world.\n";'
-
-
- # useful at end of "find foo -print"
- perl -n -e 'chop;unlink;'
-
-
- # add first and last columns (filter)
-
- perl -a -n -e 'print $F[0] + $F[$#F], "\n";'
-
-
- # in-place edit of *.c files changing all foo to bar
- perl -p -i -e 's/\bfoo\b/bar/g;' *.c
-
-
- # run a script under the debugger
-
- perl -d myscript
-
-