home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!cs.utexas.edu!sdd.hp.com!zaphod.mps.ohio-state.edu!pacific.mps.ohio-state.edu!linac!att!bu.edu!transfer!ceylon!NewsWatcher!user
- From: hhg1@gte.com (Hal German)
- Newsgroups: comp.lang.perl
- Subject: SAS compared to Perl
- Message-ID: <hhg1-110992064042@132.197.14.85>
- Date: 11 Sep 92 11:40:35 GMT
- Sender: news@ceylon.gte.com
- Followup-To: comp.lang.perl
- Organization: GTE Labs
- Lines: 511
-
-
- This is the text version of a paper presented at the Boston Area SAS User's
- Group recently. It will also be in the proceedings of the Northeast SAS
- Users Group. Enjoy !
-
- Report Writing on a Budget: Using Perl
-
-
- Hallett German
-
- GTE Laboratories, Inc.
-
-
- Introduction
- At last year's NESUG, Marge Scerbo presented an
- interesting paper showing how a few simple SAS(
- datasteps statement could be used to generate
- powerful and customizable reports.
-
- As I read through the paper, I wondered "Gee, I
- could do most of this in Perl. Or can I?" This paper
- is a response to that thought. The following is
- an outline of the paper:
-
- 1. What is Perl?
- 2. How can I learn more about Perl?
- 3. Perl Concepts
- 4. Basic Reports -- SAS vs. Perl
- * Input Forms
- * Reports
- 5. Conclusions
- 6. References
-
- After reading the paper, you should have a good
- overview of Perl's reporting capabilities and
- hopefully be encouraged to create your own reports
- with this command language.
-
- What is Perl?
- Perl was developed by Larry Wall starting in 1986. It
- officially stands for Practical Extraction and Report
- Language. [But there are those who say that like
- SAS it is a group of letters with no meaning in itself.
- You be the judge.]
-
- Perl is a powerful command language that has
- elements of C, UNIX shells, awk, sed, and much
- more. The result is a self-contained portable
- language. Perl is now almost a de facto standard with
- UNIX system administrators. [It also is used internally
- at the SAS Institute.]
-
- Perl's appeal is also because it is distributed with
- source and available free as part of GNU public
- software.It can be obtained via e-mail or from various
- anonymous ftp sites. Per l can now be found under
- AmigaOS,Atari OS, DOS [it runs fine under MS-
- Windows], Macintosh, UNIX, and VMS.
-
- Perl contains many different elements:
- -- Over 100 built-in functions
- -- A rich built-in library
- -- networking capabilities
- -- database capabilities
- -- C interfaces
- -- debugger
- -- report capabilities
- -- converters (awk, sed, C header libraries to Perl)
-
- Many utilities and interfaces have been built with
- Perl. These include interfaces to Oracle, Sybase,
- Curse, and X Windows.
-
- How can I learn more about Perl?
-
- Here are some places to look:
-
- * A free man (help) document has over 100 pages on
- Perl. A formatted copy can be obtained from the
- anonymous ftp site chem.bu.edu.
-
- * Various conferences give tutorials on Perl. These
- include USENIX, SUG (SUN), and DECUS (DEC).
-
- * The Usenet group comp.lang.perl is a treasure
- trove of Perl tips. Perl's creator Larry Wall is actively
- posting useful messages there.
-
- * Once a month, a FAQ (frequently asked questions)
- list is posted on comp.lang.perl
-
- * The Wall and Schwartz book (see references) is
- considered the source on Perl. An advanced Perl
- book is planned.
-
- * The German book covers Perl portability and has
- a healthy number of Perl references.
-
- Perl Concepts
-
- Before looking at our first Perl report, it is helpful to
- understand the following:
-
- * Perl statements usually are in lowercase except for
- filenames, and subroutines.
-
- * Perl statements must end with a semicolon.
- [Making SAS users feel right at home.]
-
- * A series of statements may be processed as a
- block. A block is contained within braces. (i.e. {})
- *
- Comments begin with a #.
-
- * Perl supports a number of data types each with its
- own unique identifier:
-
- - $ -- Scalar variables may contain numbers
- (including decimals, characters, or Boolean (1,0).
- Scalars also may hold the elements of simple and
- associative arrays.
-
- examples $a = 1; #Assigned a number
- $a = "dog" #Assigned string
-
- - @ -- Simple arrays. Can contain elements with
- numbers or characters. Each element is designated
- by a numeric key marking the position in the array.
-
- examples @array1 #entire array
- $array1[0] #First element in array
- $array1[$#array1] #Last element in
- #array.
-
- - % -- Associative arrays. Can contain elements
- with numbers or characters. Each element is
- designated by a numeric OR character key marking
- the position in the array. Associative arrays are
- beyond the scope of this paper.
-
- * The following are some of the functions that are
- used in these examples:
-
- - CLOSE. Closes an open file.
-
- - DIE. If a condition is met then die (end program) with
- an optional message. A WARN function is also
- available.
-
- - OPEN. A powerful command. May open a file
- for reading (default), writing, or both! An alias for the
- file is assigned by the user. (Like SAS's libref or
- fileref component in a LIBNAME or FILENAME
- statement.) Also may be used like SAS's LIBNAME
- PIPE/FILENAME PIPE statements to pipe output
- from a n operating system command to or from a file.
-
- Basic Reports -- SAS vs Perl: Input Forms
-
- [Do note that all examples shown are "standard Perl"
- and should be portable across operating systems. I
- created these examples on MS-DOS or a Macintosh
- and ran them of UNIX "as is!"]
-
- Data may be inputted two different ways. Interactively
- and non-interactively:
-
- Interactively:
- The following is a simple program that takes user
- input and writes it to a file. The chop function
- removes the newline.
-
- open(FILE1,">>input.txt");
- $cnt = 1;
- un:
- print "Enter the NAME of the University\n";
- $univ=substr(<STDIN>,0,21);
- chop($univ);
- cy:
- print "Enter the CITY of the University\n";
- $city=substr(<STDIN>,0,16);
- chop($city);
- printit:
- print FILE1 "$univ $city \n";
- print "Do you wish to enter another record? Y/N\n";
- $choice=substr(<STDIN>,0,1);
- if ($choice eq "Y") {$cnt++; goto un;}
- else {die "$cnt records added\n";}
-
-
- This approach is ideal for small databases. A rich
- range of data checking is possible.
-
- Non-interactively:
-
- For smaller files, you can pre-build an array that
- contains values:
-
- @array1= ("Brown University Providence",
- "Cornell Ithaca");
-
- For larger files, it is recommended to use
- compressed files or dbm files.:
-
- Compressed (Binary) Files: Files with variable-
- length records are compressed and uncompressed
- using the pack/unpack functions. This is shown a
- little later in the paper. They can also be set up as
- random-access files
-
- DBM files. DBM stands for Data Base Management.
- DBM is available in some format for all Perl
- interpreters except the Amiga and the Macintosh.
- This is done using associative arrays and is beyond
- the scope of this paper.
-
- Basic Reports -- SAS vs Perl: Input Forms
-
- Report #1 -- A Simple List
-
- The following report should be produced:
-
- BROWN UNIVERSITY
- PROVIDENCE
- CORNELL
- ITHACA
- UNIV OF MARYLAND
- BALTIMORE
- UCLA
- LOS ANGELES
- COLUMBIA
- NYC
- SYRACUSE UNIV.
- SYRACUSE
-
- To do this, the program will also: 1) split the "fields"
- of the "record" to appear on two lines and 2) convert
- the values of these fields to uppercase regardless
- whatever was the original case of the value.
-
- Here is the program that creates both the input
- record and the report:
-
- #Example 1 -- Standard Approach.
- #
- #########################
- # a. Create an array #
- #########################
- $fileo = "ex1.txt"; #Set value for file
- @array1= ("Brown University Providence",
- "Cornell Ithaca",
- "Univ of Maryland Baltimore",
- "UCLA Los Angeles",
- "Columbia NYC",
- "Syracuse Univ. Syracuse");
- ########################
- # b. Open a file for writing #
- ########################
- open(EX1,">$fileo");
- foreach $cnt (0 .. $#array1) {
- ############################
- # c. Split the "record" into two fields #
- ############################
- ($univ,$loc) = split(' ',$array1[$cnt]);
- #############################
- # d. Translate record to uppercase #
- #############################
- ($university = $univ) =~ tr/a-z/A-Z/;
- ($location = $loc) =~ tr/a-z/A-Z/;
- ############################
- # d. Write out record and close file #
- ############################
- print EX1 "$university\n$location\n";
- }
- close(EX1);
-
-
- Note that a scalar variable contains the value of the
- file name. This allows you to easily change a file
- name IN ONE PLACE ONLY when needed.
-
- Report #2 -- A Formatted List
-
-
- Formatted list like the one below can also be created
- with Perl.
-
- BROWN UNIVERSITY PROVIDENCE
- CORNELL ITHACA
- UNIV OF MARYLAND BALTIMORE
- UCLA LOS ANGELES
- COLUMBIA NYC
- SYRACUSE UNIV . SYRACUSE
-
-
- Note that it would be easy to add the UNIV text as in
- Marge's example.
-
- The following part creates the binary file:
-
- #Example 2 -- Fixed Records (Use Pack/Unpack)
- Input Part
- ####################################
- #a. Create an array #
- ####################################
- @univs = ( "Brown University", "Providence",
- "Cornell", "Ithaca",
- "Univ of Maryland", "Baltimore",
- "UCLA", "Los Angeles",
- "Columbia", "NYC",
- "Syracuse Univ.", "Syracuse");
- ####################################
- #b. Open a file for writing #
- ####################################
- open (EX2,">ex2.txt")
- || die "Can't open ex2.txt $!\n"; #exception handling
- ####################################
- #c. Go through array #
- ####################################
- foreach $i (0 .. $#univs) {
- ####################################
- #d. If university, #
- # then assign to $university. #
- ####################################
- if (($i == 0) || (length($i/2)==1)){ #first record
- $university = $univs[$i];
- }
- ####################################
- #e. If location, #
- # then assign to $location #
- # write out "packed" record #
- # close file #
- ####################################
- if (length($i/2)==3) { #location
- $location = $univs[$i];
- $line = pack("A20 A15",$university,$location);
- print EX2 $line;
- }
- }
- close(EX2);
-
-
- This example is used to retrieve and unpack the
- records from the file and create the report:
-
- # Example 2 -- Fixed Records (Use Pack/UnPack)
- Report Part
- #####################
- # a. open file and #
- # retrieve packed line #
- #####################
- file_part:
- open (EXP2,"ex2.txt")
- || die "Can't open ex2.txt $!\n";
- while (<EXP2>) {
- chop;
- $line = $_;
- }
- close(EXP2);
- ###########################
- # b. Loop through line: #
- # Unpack line #
- # Strip leading characters #
- # Rejoin line #
- # Set line to uppercase #
- # Print line #
- ##########################
- rpt_part:
- $len = length($line);
-
- for($offset=0;($offset<$len);$offset=$offset+34) {
- $lin = substr($line,$offset);
- ($univ,$loc) = unpack("A20 A15",$lin);
- @univ=split(' ',$univ); #Trim leading blanks
- @loc=split(' ',$loc);
- $unn = join(' ',$univ[0],$univ[1],$univ[2]);
- ($univ= $unn) =~ tr/a-z/A-Z/; #Change to
- uppercase
- $lon = join(' ',$loc[0],$loc[1]);
- ($loc= $lon) =~ tr/a-z/A-Z/;
- printf "%20s %15s\n",$univ,$loc; #formatted
- print
- }
-
- Example #3 Creating a formatted report using Perl.
-
- Perl has a powerful report facilit that can do pretty
- much anything SAS can with PUT statements. Here
- is a simple example:
-
- University List
- University State Zip
- BROWN UNIVERSITY RI
- UNIV. OF MARYLAND MD 21201
- UCLA CA
- COLUMBIA NY 10005
- SYRACUSE UNIV NY 13112
-
-
- This is the data as stored in the input file: [Note the *
- as a field delimiter]
-
- Brown University*ri*
- Univ. of Maryland*md*21201
- UCLA*CA*
- Columbia*ny*10005
- Syracuse Univ*ny*13112
-
- This is the Perl script that generated it: [Note that you
- first create a template and then use it.]
-
- #Example 3 -- Using Formatted Reports
- #Create a header format. Period = end of format.
- format HEAD1=
- University List
-
- University State Zip
-
- .
- #Define report format. Accent = blank line
- format EX3B=
- ~
- #<<< -- Place holder and left justification
- @<<<<<<<<<<<<<<<<<<< @<< @<<<<<
- #Variables in report
- $un, $st, $zip
-
- .
- open(EX3A,"ex3a.txt") || (die "cant open ex3a.txt
- $!\n");
- open(EX3B, ">ex3b.txt") || (die "cant open ex3b.txt
- $!\n");
- # System Variables $^ -- header format name
- # $~ -- report format name
- select (EX3B); $^ = "HEAD1"; $~ = "EX3B";
- while (<EX3A>) {
- chop;
- ($unn,$stt,$zipp) = split(/\*/,$_); #Parse fields
- ($un= $unn) =~ tr/a-z/A-Z/; #Set to Uppercase
- ($st= $stt) =~ tr/a-z/A-Z/;
- ($zip= $zipp) =~ tr/a-z/A-Z/;
- write(EX3B); #Write out report
- }
- close(EX3A);
- close(EX3B);
-
- Here is a list of report variables:
-
- $| 0 (default) writes out
- buffer every x lines.
- >0 Writes out buffer
- after a write or print.
- $% Current Page number
- $= Current page length.
- Default=60.
- $- Number of lines left on a
- page available for
- writing.
- $~ Current report format
- $^ Current header format
-
- Many other capabilities are possible such as sorting
- records, changing lines per page, and generating
- footers. Unfortunately, it would take far more pages
- than I have to cover that material.
-
- Conclusions
- This can only be the briefest of introduction to Perl's
- reporting capabilities. It offers a strong (and free)
- alternative for SAS in doing simple reports. The
- reader is encouraged to try the examples and read
- the suggested references. Posters in future years
- may discuss some of Perl's advanced reporting
- capabilities and how to create interactive Perl
- applications.
-
- Getting in touch with me/Trademarks
-
- Hallett German
- GTE Laboratories Inc
- 40 Sylvan Road
- Waltham, Ma 02254
- 617-466-2290
- hhg1@bunny.gte.com
-
- SAS ( and all other SAS products mentioned are a
- registered trademark of the SAS Institute
-
- References [Annotated]
- Bates, Douglas "Data Manipulation in Perl"
- Unpublished Paper pp1-6.
- [Strongly recommended. Has a good section on
- how to use Perl to clean up datafiles. Some of this
- capability was added into the 6.07 release.]
-
- German, Hallett Command Language Cookbook
- 1992 Van Nostrand Reinhold pp. 247-305
- [Has plenty of Perl references and a good discussion
- on Perl portability.]
-
- Scerbo, Marge "Data Step Reporting" NESUG 91
- Proceedings 1991 pp. 60-66
- [If you want to see how to generate the same
- examples using SAS, look at Marge's paper.]
-
- Wall, Larry and Randall L. Schwartz Programming
- Perl 1991 O'Reilly & Associates. pp 1-42, 106-118
- [The Perl "bible". Also called the Camel book
- because what is on the cover. A reference, tutorial,
- and code ideas book all in one place. Strongly
- recommended.]
-