home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
OS/2 Shareware BBS: 11 Util
/
11-Util.zip
/
FGREP.ZIP
/
FGREP.DOC
next >
Wrap
Text File
|
1991-02-15
|
29KB
|
731 lines
FGREP 2.00
----------
"|" denotes changes in the recent versions.
Purpose
-------
FGREP (Fast GREP) is a small utility that can be used to find
strings of characters in ASCII text files, and arbitrary
sequences of bytes in other files. String search capabilities
are not extensive (no regular expressions), but FGREP is small
and very, very fast. FGREP is intended to replace the FIND
filter with something faster and more flexible.
Version 2 of FGREP is the first version to come in both DOS and
OS/2 versions. It also adds a recursive (subdirectory) search
feature that allows you to search an entire directory tree
with one command, and a feature to display only the matched
text (rather than the line containing the matched text).
UNIX people: we fully realize that this isn't the grep or fgrep
with which you are familiar. We know that the RE in GREP means
"regular expressions" and we know that we don't support regular
expressions. However, the name serves to point people in the
right direction. Please don't write to tell us that this "isn't
really grep".
DOS vs OS/2 versions
--------------------
Beginning with version 2.0, FGREP is supplied in both DOS and
OS/2 flavors. Use and syntax is identical except for the -p and
-P options (see below). In general, the two versions will be
supplied in separate archives; FGREP.EXE is the OS/2 version,
and FGREP.COM is the DOS version.
Incompatibility notes
---------------------
In the OS/2 version, -p and -P are different options. -p is
used to pause on full screen; -P is used to set the process
priority. In the DOS version, -p is case insensitive and always
refers to the pause option.
The absolute maximum line length in the OS/2 version is about
28K characters, versus about 60K characters for DOS. See the
NOTES for more information.
Syntax
------
FGREP's syntax is
| FGREP [-abBceflmMorpPsvwxz01] target {@file}
Looks complicated, lots of switches, but the most common use is
very simple:
FGREP hello myfile.txt
This command would display all lines of MYFILE.TXT that contain
the string "hello".
Targets
-------
The target is what you're looking for in the file. There are
two ways to specify targets: as strings and as series of
hexadecimal bytes. The two can be combined in a single target
specification.
Strings are sequences of ASCII characters, like "hello".
Normally, you can just type in the string and forget it, but you
may have to "delimit" the string, i.e., bracket it by a pair of
matching non-alphanumeric characters (anything except '-' and
'$'):
'string'
/string/
.string.
Choose a delimiter that does not appear in the string. This is
no good:
'you've made a mistake'
Delimiters are required for a string target if any of these four
conditions are met:
-- it is combined with hex byte sequences (more below)
-- it contains spaces or tabs
-- it begins with a non-alphanumeric character
-- it contains the redirection characters < > |.
In the last case, the string MUST be delimited by double quotes
("), otherwise DOS will interpret it as redirection.
Examples of valid string targets:
mov
ax
"two words" (requires delimiter: contains a space)
'/7' (begins with non-alphanumeric)
"f->x" (contains ">", must use double quotes)
It is always OK to delimit a string, even if delimiters are
unnecessary.
REMEMBER that if the target contains DOS redirection characters
(<, >, or |), it MUST be delimited with double quotes ("") or
DOS will try to perform unwanted redirection. For example:
FGREP "a <= b" myfile.pas
Hex byte sequences are used for binary file searching when the
bytes to be searched for are non-displayable characters or make
no sense as strings (e.g., program code). They are specified as
pairs of hexadecimal bytes introduced by a leading '$':
$EF
$CD21 (CDh, 21h)
$CD$21 (identical; either format is OK)
$00FFFE00 (00h, FFh, FEh, 00h)
$CD 21 (ILLEGAL: no spaces permitted)
You can combine strings and hex sequences by using a '+':
"abc"+$E74A+"def"
Target wildcards
----------------
String targets (not hex targets) may contain one or mmore "?"
wildcards. The ? will match any single non-null character in
the file. E.g., "[?i]" will match "[si]", "[di]", etc., but not
"[i]". Wildcards are not permitted when hex byte sequences are
used.
If you want to search for the '?' character literally, use
'\?'. For example, the target
what\?
will search for the literal string "what?".
Specifying target files
-----------------------
You can list zero (see "Redirection"), one, or multiple files to
be searched. Files listed may include DOS's * and ? wildcards.
Paths may be specified. Examples:
myfile.txt
*.txt
*.*
this.c that.c other.c
*.c *.pas
C:\UTIL\*.* D:\SYS\*.*
E:\LIBRARY\*.D?C
| You may also specify a drive (with a terminating ':') or
| a directory (with a terminating '\'), indicating all files
| (*.*) in the specified drive or directory:
|
| c: [same as c:*.*]
| \src\ [same as \src\*.*]
File lists
----------
If the name of a file on the command line is prefixed with an
'@' character (e.g., '@list.txt'), the file is assumed to be a
text file containing the names of files to be searched. For
example:
fgrep hello @files.lst
FGREP will assume that the file FILES.LST is a text file that
contains the names of files to be searched. Each line of such a
file should contain the name(s) of one or more files to be
searched. Files in file lists are specified exactly as are
files named on the command line, except that you can't use
another '@' file list; that is, file lists can't be nested.
Here is an example of a valid list file:
this.c
c:\c\*.c d:\c\*.c
d:\misc\*.txt
The name specified for the list file cannot contain wildcards;
that is, this is illegal:
fgrep hello @lists.* (WRONG)
If you want to search for text in a file whose name begins with
an '@', use a double '@'. For example,
fgrep hello @@foo.txt
will search for 'hello' in the file @FOO.TXT.
Redirection
-----------
If no file is listed, FGREP will take its input from the
standard input device, allowing it to be used as a
filter. For example:
DIR | fgrep pas
would display any lines from a directory listing that contain
the string "pas". Or,
DIR | fgrep "<dir>"
would list all subdirectories of the current directory.
This command:
arc p somefile foo.txt | FGREP somestring
would display all lines from the file FOO.TXT in the archive
file SOMEFILE that contain "somestring".
FGREP will terminate with an error if no file is listed and
standard input is not redirected. Otherwise, it would be
waiting for keyboard input, possibly leading you to believe
that it had died.
Examples of use
---------------
Here are some examples of FGREP use:
1. List all lines of MYFILE.TXT containing "hello"
FGREP hello myfile.txt
2. List all lines of all *.C files in the current directory that
contain "include foo.c":
FGREP "include foo.c" *.c
3. List all lines from FILEA.EXT, FILEB.EXT, and FILEC.EXT that
contain the string "abcd" followed by any character:
FGREP abcd? filea.ext fileb.ext filec.ext
4. Display occurrences of the byte CDh followed by 21h (a DOS
call) in the program MYPROG.EXE:
FGREP $CD21 myprog.exe
5. Display occurrences of the string "abc" followed by a byte of
zeroes in all files in the C:\UTIL and D:\SYS directories:
FGREP 'abc' + $00 c:\util\*.* d:\sys\*.*
Output
------
FGREP's default screen output looks like this:
**File <filename>
[text of lines containing string]
**File <filename>
[text of lines containing string]
If the "Microsoft" switch (-m or -M, see below) is used, the
output is in this format:
FILE.EXT(linenum): text
This format is the standard format used by Microsoft (R)
language products and by its MEGREP. If -m is used, only the
filename is displayed; if -M is used, the path will be included
| if one was specified on the command line (or if the path was
| generated by a recursive subdirectory search).
When a hex byte sequence or the -b switch is specified, FGREP
uses the binary output mode. See below for details.
All useful output is sent to the standard output device, so it
may be piped to other programs or redirected to file:
FGREP target filea | yourprog
FGREP target filea > test.txt
Error messages and the program logo will appear always appear on
the console device, and will never appear in redirected or piped
output.
FGREP returns an errorlevel to the operating system. It will be
one of:
0: String not found in any file
1: String found in at least one file
255: Error (file read error or bad parameter)
Switches
--------
These are the option switches that control how FGREP works. All
switches are case-insensitive except -m/M, -b/B, and -p/P; for
example, either -a or -A will set ANSI mode. All switches must
precede the target on the command line, but they may be in any
order. If you specify conflicting switches, the last one will
be effective.
-a is ANSI mode. If you have ANSI.SYS (or equivalent)
installed, escape characters (ASCII 27) in displayed text
lines can cause odd results. If you use the -A switch,
FGREP will substitute an upside-down question mark (¿) for
ESC, possibly resulting in cleaner displays. You'll only
need this switch if you have ANSI installed and there are ESC
characters in the files you're scanning. Ignored in binary
mode.
-b specifies binary search mode/output. This switch is
automatically set if you use a hex byte sequence in the
target. See "binary search" below.
-B specifies a case insensitive binary search.
-c makes the search case sensitive ("String" will not match
"string" or "STRING"). Normally, FGREP ignores case. This
switch is ignored in binary mode.
-e specifies that ONLY an errorlevel is to be returned.
There will be no display at all. This is equivalent to the
combination -s0.
-f causes the "**File" header lines to be displayed only for
those files that contain the search string.
-l adds line numbers to FGREP's output. Ignored in binary mode.
-m or -M specify Microsoft (R) output format. See "Output"
above. If -m is used, only the filename will appear. If
-M is used, the path will be included if one was specified on
the command line. Ignored in binary mode.
-o specifies a maximum output width. It should be followed by a
decimal number from 1 to 255. For example, -o40 causes FGREP
to truncate all output to a maximum of 40 characters per
line. Do not confuse the "O" (Output) switch with the "0"
(zero text) switch. Ignored in binary mode.
-p pauses on full screen of output.
| -P sets the process priority (OS/2 only). See below for
| details.
|
| -r recursive (subdirectory) search. For each filespec provided,
| fgrep searches the specified directory and all subdirectories
| for matching file names. If no directory is specified, FGREP
| begins the search with the current directory. For example:
|
| fgrep -r "text" *.c d:\src\*.c
|
| This command searches all *.c files in the current directory
| and all of its subdirectories; then it searches all *.c files
| in D:\SRC and all of its subdirectories.
-s suppresses the "**File" header lines in the output.
-v provides a reVerse or negative search. That is, all lines
that do NOT contain the specified string are displayed. This
provides a handy way to get rid of lines containing specified
text. Suppose, for example, that you have a file containing
a list of file names, and you are interested in all files
EXCEPT those that contain a '$' in the name (perhaps they are
temporary files):
FGREP -v "$" filename
This switch is ignored in binary mode.
-w indicates that white space (blanks and tabs) is not
significant. White space in both the search string and the
input file will be ignored. If -w is specified, the wildcard
character (?) will match any nonblank character. Ignored in
binary mode.
-x do not display the logo/copyright.
| -z displays only the actual text found (rather than the entire
| line containing the text). This is useful with wildcards;
| for example, the command
|
| fgrep -z "??rse" dict.txt
|
| might display "parse", "purse", "horse", etc. (perhaps
| from a dictionary). Note that the displayed text will be
| all uppercase if the search is case insensitive; and, any
| embedded spaces will be missing if the search is white
| space insensitive (-w option). -z and -v (reVerse search)
| are incompatible.
-0 ("0" text lines) suppresses the display of lines of text
containing the specified string. FGREP will skip immediately
to the next file when the string is found. Do not confuse
this with the "O" (Output width) switch.
-1 ("1" text line) specifies that only the first line containing
containing the specified string in each file will be
displayed. FGREP will then skip immediately to the next
file.
Binary search
-------------
If you use a hex byte sequence in your target, or if you specify
the -b or -B switch, FGREP uses a "binary" mode rather than the
usual text mode. In this mode, FGREP is not concerned with
"lines" of text and can be used to search any file for arbitrary
sequences of bytes.
The output format is different. Because there are no "lines" to
display, FGREP shows the locations in the file where the target
was found and the sixteen bytes surrounding the find (eight
before, and eight after). Here is the output format, split into
two lines:
nnnnnnnn: n n n n n n n n . n n n n n n n n
cccccccc . cccccccc
The initial sequence is the 32-bit file offset where the target
was found, in hex. For example,
0001C47A:
indicates that the target was found at file offset 1C47A.
The next two series (n n n ...) represent the eight bytes before
and after the target (which, logically, occurred at the '.'
position):
4B 7F 37 42 42 44 FF FF . 20 20 21 48 48 4F 3C 22
The final series (ccc...) represents the same sixteen bytes
displayed as characters. Nondisplayable characters are shown as
periods.
Unlike text searches, binary searches are normally case
sensitive. That is, $41 ('A') will not match $61 ('a'). If you
want a binary search to be case INsensitive ($41 matches both
$41 and $61), use -B instead of -b. The normal case sensitive
switch (-c) is ignored.
These switches are also ignored in binary mode:
-l (show line numbers)
-w (whitespace insensitive)
-v (inverse search)
-m/M (Microsoft format)
Also, wildcards are not permitted in string targets when a
binary search is used.
OS/2 priority setting
---------------------
Under OS/2, FGREP, like all programs, starts out with a
"standard" priority (regular, lowest level). The -P option
allows you to alter the priority so that FGREP gets a greater
or lesser share of available timeslices. The format is:
-P[c | r | s] * | #
The first letter sets the priority class, as follows:
c = time critical (highest)
r = regular (default)
i = idle (lowest)
If no class letter is present, 'r' is assumed. The priority
class letter is case sensitive.
After the priority class letter is a priority value from 1
(lowest) to 32 (highest), or '*', which means "highest" (32).
Examples:
-Pc32 critical, priority 32 (highest possible)
-Pc* same as -Pc32
-Pr16 regular, priority 16 (middle of the road)
-P16 same as -Pr16
-Pi1 idle, priority 1 (lowest possible)
Depending on various system factors, you may or may not see
much difference in speed from altering the priority level.
"Fast" search algorithm
-----------------------
FGREP uses two different search techniques. Both are fast, but
one is faster than the other. The faster search will be used
UNLESS any of these conditions are met:
- line numbers or Microsoft (R) format used (-L or -m/M)
- the search is reVerse (-V)
- the search is whitespace insensitive (-W)
- the target string contains wildcards ("?")
For other searches, the "fast" technique is typically 20-40%
faster than the "slow", although we've seen 70% improvements in
some cases. One tester reported that FGREP located a string in
the last line of a 440K file in 0.97 seconds (IBM AT).
The technique has its roots in searching for unusual (less
frequently used) characters, so it will make more of a
difference searching for "squeeze" (q is a little-used letter)
than it will searching for "eat".
Notes
-----
1. The -f and -s switches are mutually exclusive. If both are
specified, the last one will be effective.
2. The -M/m and -s switches are mutually exclusive. If both
are specified, the last one will be effective.
| 3. The -v and -z switches are mutually exclusive. If both are
| specified, FGREP terminates with an error.
4. If you specify the -e switch, FGREP will stop processing as
soon as a nonzero errorlevel is determined. The -e switch is
really designed to enable other programs to determine whether or
not a specific file contains a specific string in as little time
as possible. For example, here's an algorithm that will quickly
'touch' all files that do NOT contain a specified string:
for file in (*.*) do
FGREP -e string file
if errorlevel < 1 then touch file
end
5. The -s switch is automatically set when input is taken from
standard input.
6. FGREP optimizes the combination -s0 (suppress headers, no
text) to -e.
7. If you just want to know which files contain a string, use -0;
it saves time because the rest of the file (after the first hit)
is skipped. The combination -0f is particularly efficient
for this as it will simply display a list of files that contain
the string.
8. In either version, there is a maximum line length of 500
characters if any of the -V, -W, -L, or -M switches are used;
otherwise, the maximum line length is about 60K for DOS. For
OS/2 there is an absolute maximum line line of about 28K, but
there are some conditions where line lengths grater than 750
bytes will cause problems. In general, avoid using lines longer
than 750 bytes.
9. FGREP expects standard text files; binary files will yield
weird results unless you use the -b/B binary switch. Word
processor files may or may not work, depending on how they are
formatted. Lines can be terminated by any of CR, LF, or CR+LF.
10. If output is redirected to disk, make sure there is enough
space available. The program does not check.
A little note from the author
-----------------------------
I get lots of calls like this:
I love FGREP because it's so fast. It outperforms the next
faster text search utility I have by two-to-one. If you
could just add (regular expressions | boolean searches), it
would be ideal.
Well, one of the reasons why FGREP is so fast is because it
doesn't have to do lots of complex searching. Because the
search is simple, I've been able to highly optimize the search
algorithms. If I added regular expressions or boolean searches,
the main thing that distinguishes FGREP (its speed) from
other text search programs would go away.
So. I'm always glad to get comments and suggestions on FGREP,
so please keep calling and writing. But I won't be adding
regular expressions or boolean searches. If you need those,
there are plenty of fancier search programs available (including
the real grep). The penalty for greater flexibility is usually
less speed.
Version 2.00
------------
Added multithreaded OS/2 version.
Added recursive search.
Added ability to specify just drive/directory (assumes *.*).
Added -z switch.
Includes path in **File lines if supplied or when path is
generated by a recursive search.
Version 1.71
------------
Adds file lists.
Adds file sharing support.
Corrects two bugs in binary mode display: displaying data beyond
end of file, and skipping bytes immediately after the
target.
Version 1.70
------------
Adds binary search.
Adds target concatenation (target+target).
Cancels if no file specified and not redirected.
Version 1.64
------------
Bug fix release only: fix garbage in -M display from 1.63
Version 1.63
------------
Distinguish between -M (path) and -m (no path) switches.
-M/-m and -S now mutually exclusive.
Allow search for literal '?' character via '\?'.
Version 1.62
------------
Added -M switch.
Increased max length of lines ("slow" search) from 256 to 500.
Version 1.61
------------
Fixed a problem with pausing on > 255 hits (-P not specified).
Version 1.60
------------
Added -P switch.
Fixed a problem with tab characters and -O switch.
Version 1.59
------------
Added -O switch.
Altered -L operations to allow for line numbers > 65535. Line
numbers now right justified in a 6-character field.
Version 1.58
------------
Fixed two problems that were causing endless loops in 1.55-1.57.
Added -A switch.
Version 1.57
------------
Added -X switch.
Versions 1.55/1.56
-----------------
"Fast" search algorithm added (thanks to Dave Angel for the
idea and the shove).
Fixed problems with redirected input.
Forced -v (reVerse) search to show blank lines.
Versions 1.50/1.51
------------------
These versions contain no new features but are significantly
faster than earlier versions. Standard (case-insensitive)
searches run about 40% faster than 1.45 (which was 25-30%
faster than 1.40). "Literal" searches (case-sensitive and
spacing-sensitive) are highly optimized and may be as much as
70% faster than 1.45.
Version 1.45
------------
We found a few areas that could be made more efficient. This
version can be as much as 25-30% faster than version 1.40.
The -L (line numbers) option was added, and improvements made to
parameter parsing such that delimiters are not always necessary.
Copyright/License/Warranty
--------------------------
This document and the program files FGREP.EXE and FGREP.COM
("the software") are copyrighted by the author. If you are an
individual, the copyright owner hereby licenses you to: use the
software; make as many copies of the program and documentation
as you wish; give such copies to anyone; and distribute the
software and documentation via electronic means. There is no
charge for any of the above.
However, you are specifically prohibited from charging, or
requesting donations, for any such copies, however made; and
from distributing the software and/or documentation with
commercial products without prior permission. An exception is
granted to not-for-profit user's groups, which are authorized to
charge a small fee (not to exceed $7) for materials, handling,
postage, and general overhead. NO FOR-PROFIT ORGANIZATION IS
AUTHORIZED TO CHARGE ANY AMOUNT FOR DISTRIBUTION OF COPIES OF
THE SOFTWARE OR DOCUMENTATION, OR TO INCLUDE COPIES OF THE
SOFTWARE OR DOCUMENTATION WITH SALES OF THEIR OWN PRODUCTS.
THIS INCLUDES A SPECIFIC PROHIBITION AGAINST FOR-PROFIT
ORGANIZATIONS DISTRIBUTING THE SOFTWARE, EITHER ALONE OR WITH
OTHER SOFTWARE, AND CHARGING A "HANDLING" OR "MATERIALS" FEE OR
ANY OTHER SUCH FEE FOR THE DISTRIBUTION. NO FOR-PROFIT
ORGANIZATION IS AUTHORIZED TO INCLUDE THE SOFTWARE ON ANY MEDIA
FOR WHICH MONEY IS CHARGED. PERIOD.
Businesses, institutions, and governmental entities must obtain
a specific license agreement from The Cove Software Group in
order to use the software.
No copy of the software may be distributed or given away without
this document; and this notice must not be removed.
There is no warranty of any kind, and the copyright owner is not
liable for damages of any kind. By using this free software,
you agree to this.
The software and documentation are:
Copyright (C) 1985-1991 by
Christopher J. Dunford
The Cove Software Group
P.O. Box 1072
Columbia, Maryland 21044
(301) 992-9371
CompuServe 76703,2002 [IBMNET]