home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Shareware Overload
/
ShartewareOverload.cdr
/
games
/
patterns.zip
/
FP.DOC
< prev
next >
Wrap
Text File
|
1988-04-03
|
13KB
|
315 lines
FP.EXE
Displays Those Lines from a Set of Files
That Are Matched by a Given Regular Expression Pattern
by
Robert A. Magnuson
DMB, DCRT, NIH
Bethesda, MD 20892
Feb 1988
Revised Apr 1988
FP (for Find Pattern) is a tool used to search DOS files for
those lines that are matched by a specified pattern. The found
lines are "printed", i.e., written to stdout.
[This document is intended to be read from the screen.
It contains some characters which will probably not print
correctly on a printer.]
A DOS command line that invokes FP contains a number of
arguments. The syntax is summarized by the following diagram:
┌────────────────┐
│ options: │ ┌──────────┐ ┌─ <pattern> ─┐ ┌─────────────┐
FP ─┴─ / ─┬ <ltr> ┬──┴─┴─ <ufile> ─┴─┤ ├─┴─┬ <file> ┬──┴─
└──────┘ └─ <patfile> ─┘ └───────┘
Option Definitions:
─────────────────── │ Pattern can contain nonzero
/c just give count │ hexadecimal byte representations
/e no more option args even if │ "\xhh".
next arg begins with / │
/f pattern in <patfile> │
/h highlight match │
/n print line nr with line │
/o omit printing filename │
/u unmatched lines written to <ufile> │
/v invert match │
/x compare case exact │
First there may be option arguments. Next there may be the
<ufile> name (if the /u option was taken). Then there is the
required pattern which is used to match the lines taken from the
files whose names follow. The pattern itself may appear (no /f
option), or the <patfile> names the file containing it (/f
option). The remaining filenames can be wildcarded. If no
filenames appear, FP gets its lines from stdin.
The arguments can optionally be enclosed in double
quotes. The enclosing quotes are stripped off and not
seen by FP. Should you need to have a double quote
within an argument, the argument must be double quoted
and the internal double quote must be escaped by
preceding it with a backslash. This treatment of the
double quotes is done by the argc/argv mechanism of the C
compiler. [FP is implemented in Borland TurboC.]
FP exits with ERRORLEVEL set to one if some lines were matched,
to zero if none were matched, and to two for syntax problems.
OPTION ARGUMENTS:
FP options are specified by the presence or absence of various
option letters in option arguments. Any option argument must
begin with a slash and must appear in front of the other kinds of
arguments. Any number of legal option letters can appear in an
option argument. Thus, you can have multiple option arguments,
perhaps each with a single option letter (and each beginning with
a slash), or just one option argument containing all of the
option letters desired. CURRENTLY, ALL LEGAL FP OPTION LETTERS
ARE lower case.
For the sake of readability, the above shown syntax
diagram shows only the case where all option letters
appear in one option argument.
A FP syntax error occurs when illegal option letters appear, and
when required arguments are missing. The pattern argument is
required.
When a syntax error occurs FP prints a boxed syntax diagram
containing terse instructions on how to use FP. This mechanism
can be deliberately tripped in order to get on-screen help. The
suggested way is to invoke FP with no arguments--thus causing
the no-pattern syntax error.
FP prints the ordinal line number together with each found line
when the /n option is specified. Case (upper or lower) is
ignored unless the /x option has been taken. With the /c option
FP omits printing the found lines, printing instead just the
count of the number of lines found. The /o option omits printing
the filename(s), is overridden by the /c option. The /v option
reverses the selection procedure, selecting those lines which are
not matched by the pattern. The /e option permits the pattern to
start with a slash--otherwise the pattern (with no preceding
<ufile>) would look like another option argument. The /h option
highlights the matching substrings in each displayed line on the
screen. The /h option--overridden by the /c and /v
options--should not normally be used if the output is not
intended for the screen. The /u option extends FP, allowing the
unmatched lines to be copied to a file.
FP.COM uses ANSI.SYS escape sequences for the
highlighting. These escape sequences are effective when
they reach the screen. Hence, if you redirect the output
to a file while the /h option is in effect, those escape
sequences become part of the file. Later on, if you TYPE
the file, the highlighting will be done because the
escape sequences are in the file. But, if the redirected
material is intended, say, to be part of a source
language program, the included escape sequences are not
what you want there.
REGULAR-EXPRESSION PATTERN-MATCHING:
FP selects the lines by means of pattern matching. The <pat> is
a pattern that is matched against each file line. Case (upper or
lower) is ignored unless the /x option has been taken.
A pattern is a string of characters. We distinguish two kinds of
characters: normal and meta. There are exactly nine
metacharacters:
. * + ? ^ $ [ ] \
The remaining characters are normal. Metacharacters sometimes
combine with normal characters as we will see. Otherwise, a
normal character simply matches itself. The metacharacters
behave as follows:
A matches
───── ────────────────────────────
. any single byte
* 0 or more of the preceding
+ 1 or more of the preceding
? 0 or 1 of the preceding
[...] any 1 of the enclosed bytes
[^...] any byte not enclosed
^ the beginning of the <cmp>
$ the end of the <cmp>
\α α, where α is a metabyte
α\!ß α or ß
\(α\) α (grouped for precedence)
\δ the δth group where δ is 1-9
\b beginning/end of a word
\< beginning of a word
\> end of a word
\w a word byte: [a-zA-Z0-9]
\W a nonword byte: [^a-zA-Z0-9]
Note that the '\' metacharacter is used as an escape, i.e., to
quote a metacharacter. Thus to match, e.g., a period (as a
normal character) you must use '\.' If you leave out the
backslash, the period alone will have its metacharacter meaning.
In the above explanation of the '*', '+' and '?' metacharacters,
'preceding' means 'the shortest possible preceding'. Thus, 'ab+'
matches 'ab', 'abb', etc., but not 'abab'
The square bracket metacharacters specify any one of the enclosed
characters--known as a character class. The minus sign has a
special meaning as a range in a character class. '[a-g]' can be
used in place of '[abcdefg]'. When appearing first in a
character class, a circumflex indicates that the match is with
any character not in the character class. Thus, '[^0-9]' matches
any non decimal-digit. Most metacharacters lose their special
status in a character class, and should not be escaped. If a
right square bracket is to be in a character class, it must
follow immediately the beginning left square bracket. If a minus
sign is to be in a character class, it must appear as '---',
i.e., a range containing only itself. Since the square brackets
do not nest, a left square bracket can easily be included in a
character class. E.g., '[][]' matches a right or a left square
bracket.
Some pattern match examples follow:
A matches
────────────────── ───────────────────────
zyx zyx
f.x fax, fix and fxx
f\.x f.x
f[aix]x only fax, fix and fxx
\[[a-z]+\] [hello] and [world]
\(suf\!pre\)fix suffix or prefix
ba\(na\)* banananana
[A-P]: A:, C:, H:, etc
\([cd]:\)?\w abc, c:zyx, d:cat, etc
\(abra\)\(cad\1\)* abracadabracadabra
Due to a bug in PC DOS I have changed the alternative
(i.e., the "or") from '\|' to '\!'. The vertical, '|',
is DOS's piping symbol. Although doublequoting is
supposed to protect any redirection symbols in the
interior from being acted upon, under certain
circumstances DOS performs the redirection even though
it is doublequoted.
Please note that FP's pattern matching is done via REGEX.C from
Free Software Foundation, Inc.
HEXADECIMAL REPRESENTATION IN PATTERN
Sometimes it is convenient or necessary to represent bytes in a
coded fashion. You may need a smiley face, for example. You can
keyboard this character directly in two ways: (1) by entering a
control-A, or (2) by holding down the ALT key, typing a "1" on
the numeric keyboard, then releasing the ALT key. But when you
need to document this character on your printer, the smiley face
does not print at all! Worse yet, if you need to enter a tab on
the DOS command line, DOS may translate it into spaces (up to the
next tab stop). CHP allows characters to be entered in a
hexadecimal format, either as
/xhh
or as
\xh
where the h's are hexadecimal digits. Both the "x" and the A
through F can be upper/lower case (or mixed).
EXAMPLES:
To search for lines containing 'cat', do an
fp cat alpha.txt
This will find lines containing 'cat', irrespective of case, and
regardless of surrounding material. To find lines containing
'cat' as a word--and not 'cats', 'cathode', or 'indicate', do an
fp \bcat\b alpha.txt
The '\b's say that 'cat' must not butt up against letters or
digits. To find lines containing 'cat' or 'cats' as words, do an
fp \bcats?\b alpha.txt
That pattern may be pronounced as: wordbreak, 'cat', 0 or 1 's',
wordbreak.
To search for empty lines, do an
fp/v . alpha.txt
To search for doublequotes, do an
fp/v "\"" alpha.txt
The pattern seen by FP in this case is simply one doublequote.
The C compiler's argc-argv handler sees a doublequoted escaped
doublequote. The outer doublequotes are stripped, and the inner
escaped doublequote becomes a doublequote. The resultant
doublequote is then given to FP as the second command-line
argument. [The first argument is '/v'.]
Suppose one of your utilities, say FOO, has been revised such
that one of its little used options no longer functions as it
used to. Now you must search all your batch files for
invocations of FOO. The following FP will show you all batch
file lines containing 'foo' as a separate word.
fp/n \bfoo\b c:\util\*.bat
A second way of searching for batch files containing FOO is to
use the count option.
fp/c \bfoo\b c:\util\*.bat
This will print each filename followed by the count of the number
of lines that contain FOO. If all of the counts all zero, no
action need be taken. Carrying this one step further, why not
just ask whether all the counts are nonzero? To do this we pipe
the above FP's output to another FP invocation:
fp/c \bfoo\b c:\util\*.bat | fp/v \b0$
where we want to see only those batch files whose FOO count is
not zero. The second FP's pattern is for '0' as a word that ends
the line. The /v option gives the cases where the first FP gave
a nonzero count.
Suppose you want to see lines in your FOO.C file that contain a
two-byte variable of the form: "a" followed by a digit. The
following FP will display such lines each with its ordinal line
number and with each match highlighted in reverse video.
fp/nh \ba[0-9]\b foo.c
Here's an example of how to get a "combined" directory where the
DIR's output is piped to FP. Suppose, in a large directory with
many files, you want to see those files with extensions of DOC or
BAT. In the following line FP searches through stdin (no
filenames in the FP command) for DOC or BAT surrounded by blanks.
dir|fp " doc \! bat "
Note that the pattern is doublequoted because of its internal
blanks. This is an example that failed when using the original
'\|' for the "or".
Perhaps you are looking for a filename that you can't remember.
But you think it has a double letter in it. Use the following
where the pattern matches a letter followed by itself.
dir|fp \([A-Z]\)\1