PEP
Section: Misc. Reference Manual Pages (1L)
Updated: 28 December 1989
Index
Return to Main Contents
NAME
pep - a file detergent
SYNOPSIS
pep
[
-a
]
[
-b
]
[
-c
[
size
]]
[
-d + | -
]
[
-e [ 0 | 1 | 2
]]
[
-g
file
]
[
-h
]
[
-i + | -
]
[
-k + | -
]
[
-m + | -
]
[
-o
[
b
]]
[
-p
]
[
-s
[
size
]]
[
-t
[
size
]]
[
-u
terminator
]
[
-v
]
[
-w + | -
]
[
-x
]
[
-z
]
[
filename
...
]
DESCRIPTION
Pep
is a filter program to "clean" files. It is named after a
popular Norwegian detergent.
Pep
may be used to remove control characters, strip parity bits,
interpret ANSI escape sequences, compress tabulation,
extract strings and convert character sets. Nine out of ten hackers
prefer "pep" to soap (which may very well explain why some of
them smell the way they do).
Pep
is a filter. Its default operation is to read from standard input
(the keyboard) and write on standard output (the terminal).
You may also specify the name of one or more files as the last
argument on the command line. Most versions of
pep
(not the version compiled for the DEC VMS operating system)
allow ambiguous filename arguments, were a single
filename
argument may specify several files.
You may instruct
pep
to write the result back onto the original input file with the
-o
option. If you use this option, the original file will be lost.
If you want to keep the original file (something that usually will
be the case when you do things like extracting strings from an
executable file), you should make a copy of the file before applying
pep,
and filter the copy rather than the original.
Some of the functions in
pep
(in particular those selected with the
-b
and
-s
options) may remove a lot of material from files, and it may be unfortunate if
this happens to the wrong file. It is probably a good idea to always use
pep
on copies until you have some experience with the various
pep-options.
You may also use the
b
argument on the
-o
option to save the original in a .BAK-file.
To get a brief summary of the command line syntax and all the options,
you need to specify the
-h
option. Just type the command:
-
pep -h
followed by the RETURN key. Note that just
pep
will not give you this summary. The command:
-
pep
will start
pep
as a filter, and it will just echo back whatever you type, until you
type the end of file character (usually CTRL-D or CTRL-Z).
When
pep
is running as filter, it is reading from the standard input and
writing to the standard output. In this state,
pep
will be very much less verbose than it usually is. It will still
print error messages, but very little else. Note that while:
-
pep < foobar.in > foobar.out
pep -ob foobar.txt
will do more or less the same job, the first will do it quietly,
in the tradition of Unix filters; the latter will print the
copyright notice, a detailed list of the things it will do,
and finally a list and line count
of all the files it processes as it plods along.
Pep
will remove some "noise" from files, even if no options are specified.
The following is the default behavior:
-
- *
-
remove trailing spaces;
- *
-
terminate each line with the canonical line terminator (usually LF, CR or both);
- *
-
remove underlining intended for backspacing printers;
- *
-
remove control characters (character codes < 32) except canonical line
terminator, FF and TAB;
- *
-
break the line before the FF if a line contains an FF anywhere except in the
first column.
If you want to check what
pep
actually intend to do to your file before it does it, you may make it
pause with the
-p
option. For example:
-
pep -p foobar.txt
will make
pep
stop after displaying a list of the conversions it will apply to the
file. The user is prompted and may choose to proceed
(hitting the RETURN key), or abort
the program without doing anything (hitting CTRL-C).
The user may want other conversions than the default action described
above. A number of conversion functions may be selected by specifying one or
more options on the command line.
Some of the options require an additional argument switch, and must be
followed by a "+" or a "-", other options
require a number or a filename argument.
Most of the options may be combined with other options, but a few are
mutually exclusive. If the user specifies invalid options or option
arguments, then
pep
will abort with an error message and return an error exit code on
operating systems that support exit codes.
OPTIONS
- -a
-
Write out information about
pep.
- -b
-
Remove all characters not in the original 7-bit character set (ISO 646).
I.e. remove the characters which are encoded from 128 to 255.
(If this option is combined with the
-x
option, it will print the codes for these characters in hexadecimal
instead of removing them.)
The
-b
option is powerful, and may remove a lot of bytes if you use it
on the wrong file. Only use it if you know exactly how the eight bit is
used in the file you intend to filter. Also note that the options
i, d, k, g, m, w
or
z
in most cases are better suited to
process files where the eight bit is set.
- -c [ size ]
-
Compress space into tabulation. I.e. insert TAB characters when
replacing a run of two or more SPACE characters would produce a
smaller output file.
This function is the opposite of the function invoked with the
-t
option.
-
The default tabulation size is 8,
but you may specify any other tabulation with the optional numeric
argument.
- -d + | -
-
Convert to or from the ISO 8859/1 8 bit character set and the Norwegian
version of the ISO 646 7 bit character set. If the argument is "+",
the file is converted
to
ISO 8859/1. If the argument is "-",
the file is converted
from
ISO 8859/1. The ISO 8859/1 character set is also
known as the "DEC Multinational Character Set".
- -e [ 0 | 1 | 2 ]
-
Interpret ANSI screen control sequences (also known as ANSI ESCAPE
sequences). This function makes
pep
emulate cursor positioning and other functions on an ANSI-terminal.
-
Pep
will complain about "strange" (i.e. implementation dependent) use of
ANSI escape sequences.
-
Pep
will normally save a screen image on the output file when one of
two events occur: 1) When the screen is full and scrolls up;
or 2) just before a screen image is erased with the "erase screen"
ANSI screen control sequence. In some cases important fields
on the screen will be overwritten or erased. There
is no good solution to this
problem, but
pep
provides the user with some opportunity to guard against overwriting
and erasure. This is done by specifying an additional numeric argument
to the
-e
option. This numeric indicate the level of protection
and is interpreted as follows:
-
-
- 0:
-
no protection --- fields may be erased and overwritten
(this is the default);
- 1:
-
sequences that erase fields are ignored;
- 2:
-
sequences that erase or overwrite fields are ignored.
- -g file
-
Read the conversion table from a file. The name of the file must be
appended as the argument to this option.
-
The file itself is a standard ASCII text file where each line should
contain two decimal numbers. The first number is the character code
to convert
from,
and the second number is the character code to convert
to.
A "#" character and all the following characters up to a NEWLINE is
considered a comment, and is ignored. Comments are however echoed
on the screen along with the other comments
pep
makes, unless the comment line starts with a "##".
-
Below is an example of how such a conversion file may look:
- # Convert from Macintosh to IBM-PC
##This line is not echoed on the screen.
# MAC IBM
174 146
175 157
129 143
190 145
191 155
140 134
# EOF
- -h
-
Write a brief summary of
pep
options, and exit.
- -i + | -
-
Convert to or from the IBM 8 bit character set (Code Page 850 Multilingual)
and the Norwegian
version of the ISO 646 7 bit character set. If the argument is "+",
the file is converted
to
CP 850. If the argument is "-",
the file is converted
from
CP 850. The CP 850 character set (or a subset of it)
is what is used in the IBM PC, AT, and PS/2 series of
computers and their clones. Note that some machines with
American PROMs have a yen- and cent character in
the position rightfully belonging to upper and lower case
versions of the Norwegian character
written as an "o" with a slash across it (often referred to as
oslash).
- -k + | -
-
Convert to or from a 8 bit character set and the
ISO 646 7 bit character set. This is a modified version
of the
-i
function, hacked to preserve both the
backslash
character and the upper case
oslash
character as required by, among others, the "KnowledgeMan" package. These
characters share the same code (92 decimal) in 7 bit ISO 646,
but uses different codes (92 is backslash, 157 is oslash) in
8 bit CP 850. To get around this, two backslashes in ISO 646
will be converted to the upper case oslash character in CP 850, while
a single backslash will be preserved --- and vice versa.
-
If this option is combined with the
-d
or
-m
option, the DEC/ISO or the Macintosh character sets is used as base
instead of CP 850.
- -m + | -
-
Convert to or from the Apple Macintosh 8 bit character set and the Norwegian
version of the ISO 646 7 bit character set. If the argument is "+",
the file is converted
to
the Macintosh character set; if the argument is "-",
the file is converted
from
the Macintosh character set.
See description of
-v
option below and
note in "bugs" section below about treatment of "end-of-line" and
"end-of-paragraph".
- -o [ b ]
-
Pep
will usually write the result of conversions on the standard output
(stdout).
This option instead instructs
pep
to replace each named input file with a file containing the result
of filtering the file through
pep.
If the option is augmented with the argument
b
(i.e.
-ob),
then
pep
will create a backup copy of the original input file on a file
with extension .BAK. If you just specify
-o
the original file is deleted.
-
The VMS version of
pep
will always run as if this option was specified. This is because
VMS does not support useful redirection or pipes. Therefore, it is never
necessary to specify the
-o
option under VMS, but users should still specify
-ob
if they want a backup copy of the original input file.
- -p
-
Write out a brief description the conversion functions that
will be activated by the current
set of options, and pause. The user may review the list of
conversion functions and abort (by hitting CTRL-C) if they do not have
the intended effect.
- -s [ size ]
-
Find strings in extremely "noisy" files.
-
Pep's
concept of a string is that it is a sequence of "printable" characters
of a certain length. The default minimum length of this sequence is
4, but this may be changed by the user by supplying an optional
numeric argument that becomes the minimum length of the sequence.
-
The default definition of a "printable" character is a symbol with
encoding above 31 decimal (i.e. 32 to 255) plus certain
common control characters (TAB, CR and LF). This definition
is almost always too liberal, and will include a lot of "noise" in
the output. One or more of the options
-b, -d, -i, -m
or
-z
should be specified in addition to
-s
in order to narrow the definition and the search space.
In my experience, the
-b
option is a particularly
useful additional filter when searching for strings.
- -t [ size ]
-
Expand tabulation, replacing the TAB character with a suitable number
of spaces. The default tabulation size is 8, but the optional
numeric argument
size
may be used to set tabulation to any desired size.
- -u r | n | s | - | # | number
-
Pep's
default behaviour is to terminate lines with whatever is the
canonical line terminator (the standard way to terminate
a text line) on the assumed target system for the output file.
This means CR/LF on a microcomputer system, LF on a UNIX system,
and CR if the target is a Macintosh). The assumed target system
is usually the system
pep
is running on, unless you request folding to the character set
of another computer system. Then, that computer system becomes
the assumed target.
-
The
-u
option allows you to override this assumption.
You do this by specifying explicit (in decimal) the numeric ASCII
value of the end of line character you want in your output file.
For example, to make sure
lines are terminated by LF (the standard for UNIX text files),
you may use
-u10,
because 10 is the ASCII value of the newline (LF) control character.
Instead of a numeric argument, you may specify
r,
for carrige return (CR),
n,
for newline (LF),
s,
for record separator (RS), the symbol
-,
for no line terminator, or the symbol
#
to get carrige return followed by a newline (CR/LF).
- -v
-
Normally,
pep
will terminate each line with the canonical line terminator.
Some typesetting programs and word processors, however, require
that no hard line terminator is present within a paragraph, and
that only paragraphs are hard terminated. If you want to
import a file to such a typesetting program or word processor,
you may instruct
pep
to terminate paragraphs
only
with this option.
-
See note in "bugs" section below about treatment of "end-of-line" and
"end-of-paragraph".
- -w + | -
-
This slightly obsolete option converts files to and from the
WordStar version 3.2 "document" mode. If the argument is "+",
the file is converted
to
WordStar document mode; if the argument is "-",
the file is converted
from
WordStar document mode into plain ASCII text.
- -x
-
Expand unprintable characters. This option
will make
pep
expand the characters it would otherwise remove from the file by
printing the character encoding of these characters in
hexadecimal between angle brackets.
- -z
-
Zero the eight bit (a.k.a. the parity bit) on all characters in the file.
ENVIRONMENT
Pep
knows a single environment variable:
PEP,
which may be
used to indicate the lookup path for files with conversion
tables. Below is some examples on how to set this in some
operating systems:
-
set PEP=c:\usr\lib (MS-DOS)
setenv PEP /usr/local/lib (UNIX)
define PEP "DISK_USR:<LOCAL.LIB>" (VMS)
The command to set this environment variable should usually be
part of the command file that is read during login (this may
be named
AUTOEXEC.BAT, LOGIN.COM, .profile
or
.login
depending upon your choice of operating system. Please note
that environment variables do not exist under CP/M.
EXAMPLES
Some of the examples below use i/o redirection and pipes,
as indicated with the symbols ">" and "<" (redirection)
and "|" (pipe symbol). These examples
only apply to operating systems that support
redirection and pipes.
- pep -h
-
Print a quick summary of all available options, and exit.
- pep
-
Read input from standard input (the keyboard), and write
the result on standard output (the screen) until the user
types the end of file character (usually CTRL-D (UNIX) or
CTRL-Z (MS-DOS)). This is of limited practical use by
itself, usually this command is inserted into the middle of a
command where the standard input and standard output are pipes.
- pep < foo.bar
-
Display a slightly cleaned-up version of the file
foo.bar
on the screen.
- pep < foo.bar > foo.txt
-
Read the file
foo.bar,
clean it, and write the result on the file
foo.txt.
- pep foo.bar > foo.txt
-
Read the file
foo.bar,
clean it, and write the result on the file
foo.txt.
- pep foo1.bar foo2.bar > foo.txt
-
Read the files
foo1.bar
and
foo2.bar,
clean them, and
catenate the result on the file
foo.txt.
- pep -o foo.fil bar.fil
-
Clean the files
foo.fil
and
bar.fil,
replacing the
original files with the cleaned-up versions.
- pep -ob foo.fil bar.fil
-
Clean the files
foo.fil
and
bar.fil,
replacing the
original files with the cleaned-up versions. The original
files are preserved as
foo.bak
and
bar.bak.
- pep -i+ -o program.dok
-
Convert the Norwegian text in the file
program.dok
to use
the IBM-PC 8 bit character set. Please note that this
conversion may not be 100 percent correct. For instance,
the pipe symbol "|" will be converted to the lower case Norwegian
oslash
character.
This is because the pipe symbol and the character share the
same ASCII code (124) in the Norwegian version of the 7-bit character
set, but they have different codes when
using 8-bit character sets.
- pep -e2 -o kermit.log
-
Interpret ANSI screen control sequences in the file
kermit.log.
Set guard to level 2 (no deletion or overwriting).
-
In this example, it is assumed that the file
kermit.log
is a log record of an on-line session with some Bulletin Board System (BBS).
Such files may be created with the command "log session" in the popular
kermit
communication program. Most other communication programs have
similar commands. Many BBSs uses
uses ANSI sequences for simple graphics, highlighting and
other special effects, and you will get a much more
more readable session log if you run it through
pep
with the
-e
option turned on.
- test | pep -e > test.scr
-
Run the program
test,
and pipe its output to
pep,
which interprets any ANSI sequences and store the resulting screen
images in the file
test.scr.
Note that this is only
possible on operating systems that support pipes (i.e. UNIX and MS-DOS).
-
The screen images will now be on standard text files which have the same
general layout as the original screen images. This may be useful if
you need text versions of the screen images for inclusion in manuals or
for prototypes.
- nroff -man -Tlpr pep.1l | pep > pep.doc
-
Generate a plain text version of this manual, without
backspaces or double strikes
(nroff
is the standard Unix text formatter).
- pep -d- -o *.txt
-
Convert all files with extension
.txt
from DEC/ISO character set to Norwegian 7-bit ASCII characters.
- pep -gibm2mac -ur -< foo.ibm > foo.mac
-
Use the conversion table in the file
ibm2mac
to convert
the character set in the file
foo.ibm.
Store the result on the file
foo.mac,
where each line should be terminated by a single CR character.
- pep -m- < foo.mac | pep -i+ > foo.ibm
-
Convert Apple Macintosh encoded Norwegian characters in the file
foo.mac
to IBM-PC (Code Page 850) encoding. This is an alternative way to
accomplish the same thing as the conversion done in the previous
example.
- pep -w- -o *.*
-
Convert all files in the current directory from WordStar document
mode to 7-bit ASCII.
- pep -w+ -t4 < foo.txt > foo.ws
-
Convert the file
foo.txt
to WordStar document mode format, also expanding tabulation (tabstop = 4)
to space characters. The result is stored on a file named
foo.ws.
Pep
uses a simple pattern recognition mechanism to recognize pages,
paragraphs, soft white space and soft hyphens. It will probably
not do a 100% conversion, but the file will be much easier to
edit in WordStar than the original.
- pep -z -x < foo.dat > foo.dmp
-
Strip the 8th bit and expand control characters to hex
digits in the file
foo.dat,
and store the result on the file
foo.dmp.
-
Expanding the unprintable characters to hexadecimal makes it easier to
inspect a file in an ordinary text editor, and to post-process it
by a customized filter you may create yourself
with the search/replace and macro
facilities found in many editors today.
- pep -s6 -b < pep.exe
-
Extract "strings" from the file
pep.exe.
The strings are just listed on standard output (the screen).
"Strings" are in this context assumed to be any sequence of characters
that are at least 6 characters long. The
-b
option excludes characters with codes in the range 128 to 255 from
the search. It is almost always a good idea to combine the
-b
option with
-s
option, otherwise to much garbage is picked up by the filter.
- pep -t4 -c8 -o foo.c
-
If both tab expansion
-t
and tab compression
-c
is specified, then
pep
will repack the tabulation. This is useful if you want to convert
a file from one tab-size to another (e.g. to convert non-standard
4 character tabulation into standard 8 character tabulation).
In this example, two TAB characters in the file
foo.c
are replaced by a single tab character: and any TAB character that cannot be
paired up is replaced by the appropriate number of spaces.
- pep -t -c -o foo.c
-
Remove redundant space characters in existing tabulation in the file
foo.c.
What happens is that tabulation on each line is first expanded and
then compressed again, which effectively
removes any space characters "inside" a tabulation.
DIAGNOSTICS
If you specify an option that
pep
does not recognize, then
pep
will
write a summary of usage and abort. Other errors on the
command line will result in
pep
writing an error message
before aborting.
On operating systems that support exit codes,
pep
will return an exit code upon termination.
If
pep
is interpreting ANSI escape sequences and notices
syntactical or semantical errors in the way they are used, a
warning is printed on the screen, prefixed with the string
"ansi:". This means that it is also possible to use
pep
to check if programs use ANSI sequences in a portable way.
FILES
- pep, pep.exe, pep.cmd
-
executable file (actual name depends upon which operating system you use).
- mac2ibm
-
small example of a user supplied conversion table
to convert from the Macintosh character set to that used on
the Norwegian version of the original IBM-PC (the sample file
only covers the Norwegian characters --- to complete it is
left as an exercise to the reader :-) ).
- ibm2mac
-
inverse of
mac2ibm:
conversion table from a small subset of
IBM CP 850 to Macintosh character set.
- ebc2ns7
-
conversion table from the IBM EBCDIC character set to the Norwegian
version of the ASCII 7-bit character set (ISO646 NS4551).
- ibm2ro8
-
conversion table from the IBM-PC 8-bit character
set to Hewlett-Packard ROMAN8.
- ro82ibm
-
inverse of
ibm2ro8:
conversion table from ROMAN8
to IBM-PC character set.
- ibm2iso
-
conversion table from the IBM-PC CP 850 8-bit character
set to ISO 8859/1.
- iso2ibm
-
inverse of
ibm2iso:
conversion table from ISO 8859/1 to CP 850.
AUTHOR
Copyright © 1989 Gisle Hannemyr.
Pep
may be freely distributed and copied, as long as this file
is included in the distribution and that these statements
about authorship and copyright is not altered or removed.
Bug reports, improvements, comments, suggestions and flames to:
Snail: Gisle Hannemyr, Brageveien 3A, 0452 Oslo, Norway.
Email: gisle@nr.uninett (EAN);
gisle@ifi.uio.no (Internet);
...!mcvax!ifi!gisle (UUCP);
(and several BBS mailboxes).
ACKNOWLEDGMENTS
Thanks to Robert Andersson, for the SYS-V
rename
function; and to
Knut Borge, Bjoern Larsen, Knut Omang and Geir-Harald Strand,
for elucidation of the unspeakeable mysteries of VMS.
Special thanks are due Inge Arnesen for finding and fixed a bug,
(and to Nils-Eivind Naas for bringing it to my attention).
Several people have contributed ideas and/or bug reports.
In addition to those mentioned above,
Ola Garstad, Ottar Grimstad,
Tor Sjoewall, and Jens-Henrik Soerensen
should be mentioned. My apologies if anyone
is forgotten.
SEE ALSO
dd(1),
detex(1L),
convert(VMS),
expand(1),
od(1V),
strings(1),
tr(1),
unexpand(1).
Detex(1L)
is a lex-based program to convert LaTex and TeX manuscripts into plain
ASCII text. It is available from the author upon request. Those marked
VMS are standard VMS utilities. The others are standard UNIX utilities.
BUGS
There is a very strong Norwegian bias in
pep.
In particular,
there exists several national versions of the ISO 646 7-bit
character set; but all built-in functions to convert between this
and various 8-bit character sets (i.e.
-d, -i, -k
and
-m)
bluntly assumes the standard Norwegian version of the ISO 646. For
pep
to work with other national 7-bit character sets, the
compiled in conversion tables (type FOLDMATRIX for those who read the
source code) need to be extended.
The VMS version of
pep
runs with the
-o
option permanently enabled. This is because VMS does not support an
useful i/o redirection or pipe mechanism.
The VMS Record Management Service (RMS) knows of several record formats.
You can see what record format a file is by using the VMS DCL command
DIRECTORY/FULL
and examine the field "Record format".
On VMS systems,
Pep
will always generate output files with record format set to "Stream_LF",
but some programs may require that the output file is in other
formats. To fix this, it might be necessary to run the output of
pep
through the VMS
CONVERT
utility. Please see the DEC VMS manuals for details.
The Macintosh "text only" format uses the carriage return (CR) character
(ASCII 13) as terminator. Most text processors (e.g. MacWrite)
seems capable of handling two conventions:
One is to use CR to terminate each line (and two or more
consequtive CR's between paragraphs); the other is to use CR between
paragraphs only.
Pep
is also capable of handling both conventions. The default behaviour
is to terminate each line, but the
-v
option may be used to terminate paragraphs only.
Please note that
pep
uses a rather simplistic heuristic to identify the end of a paragraph,
it bluntly assumes that paragraphs are separated by blank lines.
If you use the
-o
option, then the original input file will
be overwritten. Before you are familiar with
pep,
you may
find that it sometimes removes more material than you expect
from a file. It may be a good idea to always make a copy
of the original file before you start experimenting with
pep,
or you may add the
"b"
argument to the
-o
option
(-ob).
The built-in IBM-PC, DEC and Macintosh conversion tables
converts to and from the Norwegian version of 7-bit "ASCII"
characters. You should use the
-g
option and "general" conversion tables for all other purposes.
Pep
only knows the ANSI sequences implemented in the
standard MS-DOS console driver
ANSI.SYS.
There cannot be a space character between an option and the
option's argument (e.g. you'll have to use
"-gfoo.bar",
not
"-g foo.bar").
Pep will only filter "regular" files. It will skip directories, sockets
and "special" files.
Links are the GOTOs of file systems. If you run a hard linked file
through pep using the
-o
option, the link will not be preserved. Pep will just skip soft
linked files.
Pep
searches for the conversion tables requested with the
-g
option in the following order: first the current directory,
then the directory of the file
PEP.EXE
(MS-DOS only), and finally the directory pointed to by the
PEP
environment
variable.
Pep
knows nothing about the COFF-format and the
-s
option is
primitive compared to the UNIX command
strings(1).
So if you are on a UNIX-system --- forget about the
-s
option and use
strings(1)
instead.
Pep
will not convert Word Perfect documents into plain ASCII.
This much requested function is, however, built into Word Perfect.
It is named "store as DOS-text" and is activated by pressing
CTRL-F5 (at least in Word Perfect 4.2).
Index
- NAME
-
- SYNOPSIS
-
- DESCRIPTION
-
- OPTIONS
-
- ENVIRONMENT
-
- EXAMPLES
-
- DIAGNOSTICS
-
- FILES
-
- AUTHOR
-
- ACKNOWLEDGMENTS
-
- SEE ALSO
-
- BUGS
-
This document was created by
man2html,
using the manual pages.
Time: 01:35:14 GMT, February 01, 2023