home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Power-Programmierung
/
CD2.mdf
/
tools
/
pep
/
pep.doc
< prev
next >
Wrap
Text File
|
1989-12-28
|
35KB
|
925 lines
PEP(1L) LOCAL COMMANDS PEP(1L)
NAME
pep - a file detergent
SYNOPSIS
pep [ -a ] [ -b ] [ -c [ size ]] [ -d + | - ]
[ -e [ 0 | 1 | 2 ]] [ -g file ] [ -h ] [ -i + | - ]
[ -k + | - ] [ -m + | - ] [ -o [ b ]] [ -p ]
[ -s [ size ]] [ -t [ size ]] [ -u terminator ] [ -v ]
[ -w + | - ] [ -x ] [ -z ] [ filename ... ]
DESCRIPTION
Pep is a filter program to "clean" files. It is named after
a popular Norwegian detergent.
Pep may be used to remove control characters, strip parity
bits, interpret ANSI escape sequences, compress tabulation,
extract strings and convert character sets. Nine out of ten
hackers prefer "pep" to soap (which may very well explain
why some of them smell the way they do).
Pep is a filter. Its default operation is to read from
standard input (the keyboard) and write on standard output
(the terminal).
You may also specify the name of one or more files as the
last argument on the command line. Most versions of pep
(not the version compiled for the DEC VMS operating system)
allow ambiguous filename arguments, were a single filename
argument may specify several files.
You may instruct pep to write the result back onto the ori-
ginal input file with the -o option. If you use this
option, the original file will be lost. If you want to keep
the original file (something that usually will be the case
when you do things like extracting strings from an execut-
able file), you should make a copy of the file before apply-
ing pep, and filter the copy rather than the original. Some
of the functions in pep (in particular those selected with
the -b and -s options) may remove a lot of material from
files, and it may be unfortunate if this happens to the
wrong file. It is probably a good idea to always use pep on
copies until you have some experience with the various
pep-options. You may also use the b argument on the -o
option to save the original in a .BAK-file.
To get a brief summary of the command line syntax and all
the options, you need to specify the -h option. Just type
the command:
pep -h
followed by the RETURN key. Note that just pep will not
give you this summary. The command:
Version 2.1 Last change: 28 December 1989 1
PEP(1L) LOCAL COMMANDS PEP(1L)
pep
will start pep as a filter, and it will just echo back what-
ever you type, until you type the end of file character
(usually CTRL-D or CTRL-Z).
When pep is running as filter, it is reading from the stan-
dard input and writing to the standard output. In this
state, pep will be very much less verbose than it usually
is. It will still print error messages, but very little
else. Note that while:
pep < foobar.in > foobar.out
pep -ob foobar.txt
will do more or less the same job, the first will do it
quietly, in the tradition of Unix filters; the latter will
print the copyright notice, a detailed list of the things it
will do, and finally a list and line count of all the files
it processes as it plods along.
Pep will remove some "noise" from files, even if no options
are specified. The following is the default behavior:
+ remove trailing spaces;
+ terminate each line with the canonical line termina-
tor (usually LF, CR or both);
+ remove underlining intended for backspacing
printers;
+ remove control characters (character codes < 32)
except canonical line terminator, FF and TAB;
+ break the line before the FF if a line contains an
FF anywhere except in the first column.
If you want to check what pep actually intend to do to your
file before it does it, you may make it pause with the -p
option. For example:
pep -p foobar.txt
will make pep stop after displaying a list of the conver-
sions it will apply to the file. The user is prompted and
may choose to proceed (hitting the RETURN key), or abort the
program without doing anything (hitting CTRL-C).
The user may want other conversions than the default action
described above. A number of conversion functions may be
selected by specifying one or more options on the command
line.
Version 2.1 Last change: 28 December 1989 2
PEP(1L) LOCAL COMMANDS PEP(1L)
Some of the options require an additional argument switch,
and must be followed by a "+" or a "-", other options
require a number or a filename argument. Most of the
options may be combined with other options, but a few are
mutually exclusive. If the user specifies invalid options
or option arguments, then pep will abort with an error mes-
sage and return an error exit code on operating systems that
support exit codes.
OPTIONS
-a Write out information about pep.
-b Remove all characters not in the original 7-bit charac-
ter set (ISO 646). I.e. remove the characters which
are encoded from 128 to 255. (If this option is com-
bined with the -x option, it will print the codes for
these characters in hexadecimal instead of removing
them.) The -b option is powerful, and may remove a lot
of bytes if you use it on the wrong file. Only use it
if you know exactly how the eight bit is used in the
file you intend to filter. Also note that the options
i, d, k, g, m, w or z in most cases are better suited
to process files where the eight bit is set.
-c [ size ]
Compress space into tabulation. I.e. insert TAB char-
acters when replacing a run of two or more SPACE char-
acters would produce a smaller output file. This func-
tion is the opposite of the function invoked with the
-t option.
The default tabulation size is 8, but you may specify
any other tabulation with the optional numeric argu-
ment.
-d + | -
Convert to or from the ISO 8859/1 8 bit character set
and the Norwegian version of the ISO 646 7 bit charac-
ter set. If the argument is "+", the file is converted
to ISO 8859/1. If the argument is "-", the file is
converted from ISO 8859/1. The ISO 8859/1 character
set is also known as the "DEC Multinational Character
Set".
-e [ 0 | 1 | 2 ]
Interpret ANSI screen control sequences (also known as
ANSI ESCAPE sequences). This function makes pep emu-
late cursor positioning and other functions on an
ANSI-terminal.
Pep will complain about "strange" (i.e. implementation
dependent) use of ANSI escape sequences.
Version 2.1 Last change: 28 December 1989 3
PEP(1L) LOCAL COMMANDS PEP(1L)
Pep will normally save a screen image on the output
file when one of two events occur: 1) When the screen
is full and scrolls up; or 2) just before a screen
image is erased with the "erase screen" ANSI screen
control sequence. In some cases important fields on
the screen will be overwritten or erased. There is no
good solution to this problem, but pep provides the
user with some opportunity to guard against overwriting
and erasure. This is done by specifying an additional
numeric argument to the -e option. This numeric indi-
cate the level of protection and is interpreted as fol-
lows:
0: no protection - fields may be erased and
overwritten (this is the default);
1: sequences that erase fields are ignored;
2: sequences that erase or overwrite fields are
ignored.
-g file
Read the conversion table from a file. The name of the
file must be appended as the argument to this option.
The file itself is a standard ASCII text file where
each line should contain two decimal numbers. The
first number is the character code to convert from, and
the second number is the character code to convert to.
A "#" character and all the following characters up to
a NEWLINE is considered a comment, and is ignored.
Comments are however echoed on the screen along with
the other comments pep makes, unless the comment line
starts with a "##".
Below is an example of how such a conversion file may
look:
# Convert from Macintosh to IBM-PC
##This line is not echoed on the screen.
# MAC IBM
174 146
175 157
129 143
190 145
191 155
140 134
# EOF
-h Write a brief summary of pep options, and exit.
-i + | -
Version 2.1 Last change: 28 December 1989 4
PEP(1L) LOCAL COMMANDS PEP(1L)
Convert to or from the IBM 8 bit character set (Code
Page 850 Multilingual) and the Norwegian version of the
ISO 646 7 bit character set. If the argument is "+",
the file is converted to CP 850. If the argument is
"-", the file is converted from CP 850. The CP 850
character set (or a subset of it) is what is used in
the IBM PC, AT, and PS/2 series of computers and their
clones. Note that some machines with American PROMs
have a yen- and cent character in the position right-
fully belonging to upper and lower case versions of the
Norwegian character written as an "o" with a slash
across it (often referred to as oslash).
-k + | -
Convert to or from a 8 bit character set and the ISO
646 7 bit character set. This is a modified version of
the -i function, hacked to preserve both the backslash
character and the upper case oslash character as
required by, among others, the "KnowledgeMan" package.
These characters share the same code (92 decimal) in 7
bit ISO 646, but uses different codes (92 is backslash,
157 is oslash) in 8 bit CP 850. To get around this,
two backslashes in ISO 646 will be converted to the
upper case oslash character in CP 850, while a single
backslash will be preserved - and vice versa.
If this option is combined with the -d or -m option,
the DEC/ISO or the Macintosh character sets is used as
base instead of CP 850.
-m + | -
Convert to or from the Apple Macintosh 8 bit character
set and the Norwegian version of the ISO 646 7 bit
character set. If the argument is "+", the file is
converted to the Macintosh character set; if the argu-
ment is "-", the file is converted from the Macintosh
character set. See description of -v option below and
note in "bugs" section below about treatment of "end-
of-line" and "end-of-paragraph".
-o [ b ]
Pep will usually write the result of conversions on the
standard output (stdout). This option instead instructs
pep to replace each named input file with a file con-
taining the result of filtering the file through pep.
If the option is augmented with the argument b (i.e.
-ob), then pep will create a backup copy of the origi-
nal input file on a file with extension .BAK. If you
just specify -o the original file is deleted.
The VMS version of pep will always run as if this
option was specified. This is because VMS does not
Version 2.1 Last change: 28 December 1989 5
PEP(1L) LOCAL COMMANDS PEP(1L)
support useful redirection or pipes. Therefore, it is
never necessary to specify the -o option under VMS, but
users should still specify -ob if they want a backup
copy of the original input file.
-p Write out a brief description the conversion functions
that will be activated by the current set of options,
and pause. The user may review the list of conversion
functions and abort (by hitting CTRL-C) if they do not
have the intended effect.
-s [ size ]
Find strings in extremely "noisy" files.
Pep's concept of a string is that it is a sequence of
"printable" characters of a certain length. The
default minimum length of this sequence is 4, but this
may be changed by the user by supplying an optional
numeric argument that becomes the minimum length of the
sequence.
The default definition of a "printable" character is a
symbol with encoding above 31 decimal (i.e. 32 to 255)
plus certain common control characters (TAB, CR and
LF). This definition is almost always too liberal, and
will include a lot of "noise" in the output. One or
more of the options -b, -d, -i, -m or -z should be
specified in addition to -s in order to narrow the
definition and the search space. In my experience, the
-b option is a particularly useful additional filter
when searching for strings.
-t [ size ]
Expand tabulation, replacing the TAB character with a
suitable number of spaces. The default tabulation size
is 8, but the optional numeric argument size may be
used to set tabulation to any desired size.
-u r | n | s | - | # | number
Pep's default behaviour is to terminate lines with
whatever is the canonical line terminator (the standard
way to terminate a text line) on the assumed target
system for the output file. This means CR/LF on a
microcomputer system, LF on a UNIX system, and CR if
the target is a Macintosh). The assumed target system
is usually the system pep is running on, unless you
request folding to the character set of another com-
puter system. Then, that computer system becomes the
assumed target.
The -u option allows you to override this assumption.
You do this by specifying explicit (in decimal) the
Version 2.1 Last change: 28 December 1989 6
PEP(1L) LOCAL COMMANDS PEP(1L)
numeric ASCII value of the end of line character you
want in your output file. For example, to make sure
lines are terminated by LF (the standard for UNIX text
files), you may use -u10, because 10 is the ASCII value
of the newline (LF) control character. Instead of a
numeric argument, you may specify r, for carrige return
(CR), n, for newline (LF), s, for record separator
(RS), the symbol -, for no line terminator, or the sym-
bol # to get carrige return followed by a newline
(CR/LF).
-v Normally, pep will terminate each line with the canoni-
cal line terminator. Some typesetting programs and
word processors, however, require that no hard line
terminator is present within a paragraph, and that only
paragraphs are hard terminated. If you want to import
a file to such a typesetting program or word processor,
you may instruct pep to terminate paragraphs only with
this option.
See note in "bugs" section below about treatment of
"end-of-line" and "end-of-paragraph".
-w + | -
This slightly obsolete option converts files to and
from the WordStar version 3.2 "document" mode. If the
argument is "+", the file is converted to WordStar
document mode; if the argument is "-", the file is con-
verted from WordStar document mode into plain ASCII
text.
-x Expand unprintable characters. This option will make
pep expand the characters it would otherwise remove
from the file by printing the character encoding of
these characters in hexadecimal between angle brackets.
-z Zero the eight bit (a.k.a. the parity bit) on all char-
acters in the file.
ENVIRONMENT
Pep knows a single environment variable: PEP, which may be
used to indicate the lookup path for files with conversion
tables. Below is some examples on how to set this in some
operating systems:
set PEP=c:\usr\lib (MS-DOS)
setenv PEP /usr/local/lib (UNIX)
define PEP "DISK_USR:<LOCAL.LIB>" (VMS)
The command to set this environment variable should usually
be part of the command file that is read during login (this
may be named AUTOEXEC.BAT, LOGIN.COM, .profile or .login
depending upon your choice of operating system. Please note
Version 2.1 Last change: 28 December 1989 7
PEP(1L) LOCAL COMMANDS PEP(1L)
that environment variables do not exist under CP/M.
EXAMPLES
Some of the examples below use i/o redirection and pipes, as
indicated with the symbols ">" and "<" (redirection) and "|"
(pipe symbol). These examples only apply to operating sys-
tems that support redirection and pipes.
pep -h
Print a quick summary of all available options, and exit.
pep
Read input from standard input (the keyboard), and write
the result on standard output (the screen) until the user
types the end of file character (usually CTRL-D (UNIX) or
CTRL-Z (MS-DOS)). This is of limited practical use by
itself, usually this command is inserted into the middle
of a command where the standard input and standard output
are pipes.
pep < foo.bar
Display a slightly cleaned-up version of the file foo.bar
on the screen.
pep < foo.bar > foo.txt
Read the file foo.bar, clean it, and write the result on
the file foo.txt.
pep foo.bar > foo.txt
Read the file foo.bar, clean it, and write the result on
the file foo.txt.
pep foo1.bar foo2.bar > foo.txt
Read the files foo1.bar and foo2.bar, clean them, and
catenate the result on the file foo.txt.
pep -o foo.fil bar.fil
Clean the files foo.fil and bar.fil, replacing the origi-
nal files with the cleaned-up versions.
pep -ob foo.fil bar.fil
Clean the files foo.fil and bar.fil, replacing the origi-
nal files with the cleaned-up versions. The original
files are preserved as foo.bak and bar.bak.
pep -i+ -o program.dok
Convert the Norwegian text in the file program.dok to use
the IBM-PC 8 bit character set. Please note that this
conversion may not be 100 percent correct. For instance,
the pipe symbol "|" will be converted to the lower case
Norwegian oslash character. This is because the pipe
symbol and the character share the same ASCII code (124)
Version 2.1 Last change: 28 December 1989 8
PEP(1L) LOCAL COMMANDS PEP(1L)
in the Norwegian version of the 7-bit character set, but
they have different codes when using 8-bit character
sets.
pep -e2 -o kermit.log
Interpret ANSI screen control sequences in the file
kermit.log. Set guard to level 2 (no deletion or
overwriting).
In this example, it is assumed that the file kermit.log
is a log record of an on-line session with some Bulletin
Board System (BBS). Such files may be created with the
command "log session" in the popular kermit communication
program. Most other communication programs have similar
commands. Many BBSs uses uses ANSI sequences for simple
graphics, highlighting and other special effects, and you
will get a much more more readable session log if you run
it through pep with the -e option turned on.
test | pep -e > test.scr
Run the program test, and pipe its output to pep, which
interprets any ANSI sequences and store the resulting
screen images in the file test.scr. Note that this is
only possible on operating systems that support pipes
(i.e. UNIX and MS-DOS).
The screen images will now be on standard text files
which have the same general layout as the original screen
images. This may be useful if you need text versions of
the screen images for inclusion in manuals or for proto-
types.
nroff -man -Tlpr pep.1l | pep > pep.doc
Generate a plain text version of this manual, without
backspaces or double strikes (nroff is the standard Unix
text formatter).
pep -d- -o *.txt
Convert all files with extension .txt from DEC/ISO char-
acter set to Norwegian 7-bit ASCII characters.
pep -gibm2mac -ur -< foo.ibm > foo.mac
Use the conversion table in the file ibm2mac to convert
the character set in the file foo.ibm. Store the result
on the file foo.mac, where each line should be terminated
by a single CR character.
pep -m- < foo.mac | pep -i+ > foo.ibm
Convert Apple Macintosh encoded Norwegian characters in
the file foo.mac to IBM-PC (Code Page 850) encoding.
This is an alternative way to accomplish the same thing
as the conversion done in the previous example.
Version 2.1 Last change: 28 December 1989 9
PEP(1L) LOCAL COMMANDS PEP(1L)
pep -w- -o *.*
Convert all files in the current directory from WordStar
document mode to 7-bit ASCII.
pep -w+ -t4 < foo.txt > foo.ws
Convert the file foo.txt to WordStar document mode for-
mat, also expanding tabulation (tabstop = 4) to space
characters. The result is stored on a file named foo.ws.
Pep uses a simple pattern recognition mechanism to recog-
nize pages, paragraphs, soft white space and soft
hyphens. It will probably not do a 100% conversion, but
the file will be much easier to edit in WordStar than the
original.
pep -z -x < foo.dat > foo.dmp
Strip the 8th bit and expand control characters to hex
digits in the file foo.dat, and store the result on the
file foo.dmp.
Expanding the unprintable characters to hexadecimal makes
it easier to inspect a file in an ordinary text editor,
and to post-process it by a customized filter you may
create yourself with the search/replace and macro facili-
ties found in many editors today.
pep -s6 -b < pep.exe
Extract "strings" from the file pep.exe. The strings are
just listed on standard output (the screen). "Strings"
are in this context assumed to be any sequence of charac-
ters that are at least 6 characters long. The -b option
excludes characters with codes in the range 128 to 255
from the search. It is almost always a good idea to com-
bine the -b option with -s option, otherwise to much gar-
bage is picked up by the filter.
pep -t4 -c8 -o foo.c
If both tab expansion -t and tab compression -c is speci-
fied, then pep will repack the tabulation. This is use-
ful if you want to convert a file from one tab-size to
another (e.g. to convert non-standard 4 character tabula-
tion into standard 8 character tabulation). In this
example, two TAB characters in the file foo.c are
replaced by a single tab character: and any TAB character
that cannot be paired up is replaced by the appropriate
number of spaces.
pep -t -c -o foo.c
Remove redundant space characters in existing tabulation
in the file foo.c. What happens is that tabulation on
each line is first expanded and then compressed again,
which effectively removes any space characters "inside" a
tabulation.
Version 2.1 Last change: 28 December 1989 10
PEP(1L) LOCAL COMMANDS PEP(1L)
DIAGNOSTICS
If you specify an option that pep does not recognize, then
pep will write a summary of usage and abort. Other errors
on the command line will result in pep writing an error mes-
sage before aborting.
On operating systems that support exit codes, pep will
return an exit code upon termination.
If pep is interpreting ANSI escape sequences and notices
syntactical or semantical errors in the way they are used, a
warning is printed on the screen, prefixed with the string
"ansi:". This means that it is also possible to use pep to
check if programs use ANSI sequences in a portable way.
FILES
pep, pep.exe, pep.cmd
executable file (actual name depends upon which
operating system you use).
mac2ibm small example of a user supplied conversion table
to convert from the Macintosh character set to
that used on the Norwegian version of the original
IBM-PC (the sample file only covers the Norwegian
characters - to complete it is left as an exercise
to the reader :-) ).
ibm2mac inverse of mac2ibm: conversion table from a small
subset of IBM CP 850 to Macintosh character set.
ebc2ns7 conversion table from the IBM EBCDIC character set
to the Norwegian version of the ASCII 7-bit char-
acter set (ISO646 NS4551).
ibm2ro8 conversion table from the IBM-PC 8-bit character
set to Hewlett-Packard ROMAN8.
ro82ibm inverse of ibm2ro8: conversion table from ROMAN8
to IBM-PC character set.
ibm2iso conversion table from the IBM-PC CP 850 8-bit
character set to ISO 8859/1.
iso2ibm inverse of ibm2iso: conversion table from ISO
8859/1 to CP 850.
AUTHOR
Copyright (C) 1989 Gisle Hannemyr.
Pep may be freely distributed and copied, as long as this
file is included in the distribution and that these state-
ments about authorship and copyright is not altered or
Version 2.1 Last change: 28 December 1989 11
PEP(1L) LOCAL COMMANDS PEP(1L)
removed.
Bug reports, improvements, comments, suggestions and flames
to:
Snail: Gisle Hannemyr, Brageveien 3A, 0452 Oslo, Norway.
Email: gisle@nr.uninett (EAN);
gisle@ifi.uio.no (Internet);
...!mcvax!ifi!gisle (UUCP);
(and several BBS mailboxes).
ACKNOWLEDGMENTS
Thanks to Robert Andersson, for the SYS-V rename function;
and to Knut Borge, Bjoern Larsen, Knut Omang and Geir-Harald
Strand, for elucidation of the unspeakeable mysteries of
VMS. Special thanks are due Inge Arnesen for finding and
fixed a bug, (and to Nils-Eivind Naas for bringing it to my
attention).
Several people have contributed ideas and/or bug reports.
In addition to those mentioned above, Ola Garstad, Ottar
Grimstad, Tor Sjoewall, and Jens-Henrik Soerensen should be
mentioned. My apologies if anyone is forgotten.
SEE ALSO
dd(1), detex(1L), convert(VMS), expand(1), od(1V),
strings(1), tr(1), unexpand(1).
Detex(1L) is a lex-based program to convert LaTex and TeX
manuscripts into plain ASCII text. It is available from the
author upon request. Those marked VMS are standard VMS
utilities. The others are standard UNIX utilities.
BUGS
There is a very strong Norwegian bias in pep. In particular,
there exists several national versions of the ISO 646 7-bit
character set; but all built-in functions to convert between
this and various 8-bit character sets (i.e. -d, -i, -k and
-m) bluntly assumes the standard Norwegian version of the
ISO 646. For pep to work with other national 7-bit character
sets, the compiled in conversion tables (type FOLDMATRIX for
those who read the source code) need to be extended.
The VMS version of pep runs with the -o option permanently
enabled. This is because VMS does not support an useful i/o
redirection or pipe mechanism.
The VMS Record Management Service (RMS) knows of several
record formats. You can see what record format a file is by
using the VMS DCL command DIRECTORY/FULL and examine the
field "Record format". On VMS systems, Pep will always gen-
erate output files with record format set to "Stream_LF",
but some programs may require that the output file is in
Version 2.1 Last change: 28 December 1989 12
PEP(1L) LOCAL COMMANDS PEP(1L)
other formats. To fix this, it might be necessary to run
the output of pep through the VMS CONVERT utility. Please
see the DEC VMS manuals for details.
The Macintosh "text only" format uses the carriage return
(CR) character (ASCII 13) as terminator. Most text proces-
sors (e.g. MacWrite) seems capable of handling two conven-
tions: One is to use CR to terminate each line (and two or
more consequtive CR's between paragraphs); the other is to
use CR between paragraphs only. Pep is also capable of han-
dling both conventions. The default behaviour is to ter-
minate each line, but the -v option may be used to terminate
paragraphs only. Please note that pep uses a rather
simplistic heuristic to identify the end of a paragraph, it
bluntly assumes that paragraphs are separated by blank
lines.
If you use the -o option, then the original input file will
be overwritten. Before you are familiar with pep, you may
find that it sometimes removes more material than you expect
from a file. It may be a good idea to always make a copy of
the original file before you start experimenting with pep,
or you may add the "b" argument to the -o option (-ob).
The built-in IBM-PC, DEC and Macintosh conversion tables
converts to and from the Norwegian version of 7-bit "ASCII"
characters. You should use the -g option and "general"
conversion tables for all other purposes.
Pep only knows the ANSI sequences implemented in the stan-
dard MS-DOS console driver ANSI.SYS.
There cannot be a space character between an option and the
option's argument (e.g. you'll have to use "-gfoo.bar", not
"-g foo.bar").
Pep will only filter "regular" files. It will skip direc-
tories, sockets and "special" files.
Links are the GOTOs of file systems. If you run a hard
linked file through pep using the -o option, the link will
not be preserved. Pep will just skip soft linked files.
Pep searches for the conversion tables requested with the -g
option in the following order: first the current directory,
then the directory of the file PEP.EXE (MS-DOS only), and
finally the directory pointed to by the PEP environment
variable.
Pep knows nothing about the COFF-format and the -s option is
primitive compared to the UNIX command strings(1). So if
you are on a UNIX-system - forget about the -s option and
Version 2.1 Last change: 28 December 1989 13
PEP(1L) LOCAL COMMANDS PEP(1L)
use strings(1) instead.
Pep will not convert Word Perfect documents into plain
ASCII. This much requested function is, however, built into
Word Perfect. It is named "store as DOS-text" and is
activated by pressing CTRL-F5 (at least in Word Perfect
4.2).
Version 2.1 Last change: 28 December 1989 14