home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
World of Shareware - Software Farm 2
/
wosw_2.zip
/
wosw_2
/
CPROG
/
WILDF113.ZIP
/
WILDFILE.DOC
< prev
next >
Wrap
Text File
|
1991-05-08
|
9KB
|
227 lines
WildFile for MS-DOS systems
A *IX SH style file globber written in C
V1.13 Dedicated to the Public Domain
April 22, 1991
J. Kercheval
[72450,3702] -- johnk@wrq.com
05-07-91
This is V1.13 of Wildfile a *IX SH style file Globber.
The purpose of this code is to enable the programmer to allow *real*
wildcard file specification. The UNIX (*IX) SH style wildcard system
has been around for decades and there is absolutely no good reason for
it's lack of presence within MS/PC DOS and its associated tools.
I submit this without copyright and with the clear understanding that
this code may be used by anyone, for any reason, with any modifications
and without any guarantees, warrantee or statements of usability of any
sort.
jbk
*IX SH style wildcard file globbing
===================================
The unix style of wildcard globbing (matching files to a wildcard
specification) is quite a bit more flexible than the standard
approach seen on the MS\PC DOS machines. The full power of *IX SH
style regular expressions are allowed to specify a file name. For
instance:
"*t*" would match to the filenames test.doc, wet.goo,
itsy.bib, foo.tic, etc.
"th?[a-eg]." would match to any file without an extension,
whose first two letters were "th", with any third
letter and whose last letter was a,b,c,d,e or g.
(ie. thug, thod, thud, etc.)
"*" would match all filenames.
The regular expression syntax is described in detail in the source
code and below.
Implementation
==============
The implementation of the wildcard package is similar in type to the
standard MS/PC DOS function calls for file searches. There is a
find_firstfile call which begins a search initially and a
find_nextfile call which continues a previous search. This approach
will normally yeild a very quick port from existing *standard*
implementations of wildcard file searching.
The include file WILDFILE.H does a good job of describing the
specifics required here.
WD
==
WD is a very quick implementation of a directory lister to try to
show the usage of the wildfile module as intended. The program is a
fully functional program complete with usage messages and command
line argument parsing.
Languages
=========
WILDFILE (and its associated module MATCH) were developed and
compiled using both MicroSoft C V6.00A and Borland C++.
============================================================================
05-07-91
Wildfile uses MATCH V1.10 modified for the specific nits required
the MS/PC DOS environment. The documentation for MATCH is included
for convenience.
jbk
============================================================================
MATCH110
REGEX Globber (Wild Card Matching)
A *IX SH style pattern matcher written in C
V1.10 Dedicated to the Public Domain
March 12, 1991
J. Kercheval
[72450,3702] -- johnk@wrq.com
*IX SH style Regular Expressions
================================
The *IX command SH is a working shell similar in feel to the MSDOS
shell COMMAND.COM. In point of fact much of what we see in our
familiar DOS PROMPT was gleaned from the early UNIX shells available
for many of machines the people involved in the computing arena had
at the time of the development of DOS and it's much maligned
precursor CP/M (although the UNIX shells were and are much more
flexible and powerful then those on the current flock of micro
machines). The designers of DOS and CP/M did some fairly strange
things with their command processor and OS. One of those things was
to only selectively adopt the regular expressions allowed within the
*IX shells. Only '?' and '*' were allowed in filenames and even with
these the '*' was allowed only at the end of a pattern and in fact
when used to specify the filename the '*' did not apply to extension.
This gave rise to the all too common expression "*.*".
REGEX Globber is a SH pattern matcher. This allows such
specifications as *75.zip or * (equivelant to *.* in DOS lingo).
Expressions such as [a-e]*t would fit the name "apple.crt" or
"catspaw.bat" or "elegant". This allows considerably wider
flexibility in file specification, general parsing or any other
circumstance in which this type of pattern matching is wanted.
A match would mean that the entire string TEXT is used up in matching
the PATTERN and conversely the matched TEXT uses up the entire
PATTERN.
In the specified pattern string:
`*' matches any sequence of characters (zero or more)
`?' matches any character
`\' suppresses syntactic significance of a special character
[SET] matches any character in the specified set,
[!SET] or [^SET] matches any character not in the specified set.
A set is composed of characters or ranges; a range looks like
'character hyphen character' (as in 0-9 or A-Z). [0-9a-zA-Z_] is the
minimal set of characters allowed in the [..] pattern construct.
Other characters are allowed (ie. 8 bit characters) if your system
will support them (it almost certainly will).
To suppress the special syntactic significance of any of `[]*?!^-\',
and match the character exactly, precede it with a `\'.
To view several examples of good and bad patterns and text see the
output of MATCHTST.BAT
MATCH() and MATCHE()
====================
The match module as written has two parsing routines, one is matche()
and the other is match(). Since match() is a call to matche() which
simply has its output mapped to a BOOLEAN value (ie TRUE if pattern
matches or FALSE otherwise), I will concentrate my explanations here
on matche().
The purpose of matche() is to match a pattern against a string of
text (usually a file name or specification). The match routine has
extensive pattern validity checking built into it as part of the
parser and allows for a robust pattern match.
The parser gives an error code on return of type int. The error code
will be one of the the following defined values (defined in match.h):
MATCH_PATTERN - bad pattern or misformed pattern
MATCH_LITERAL - match failed on character match (standard
character)
MATCH_RANGE - match failure on character range ([..] construct)
MATCH_ABORT - premature end of text string (pattern longer
than text string)
MATCH_END - premature end of pattern string (text longer
than pattern called for)
MATCH_VALID - valid match using pattern
The functions are declared as follows:
BOOLEAN match (char *pattern, char *text);
int matche(register char *pattern, register char *text);
IS_VALID_PATTERN() and IS_PATTERN()
===================================
There are two routines for determining properties of a pattern
string. The first, is_pattern(), is designed simply to determine if
some character exists within the text which is consistent with a SH
regular expression (this function returns TRUE if so and FALSE if
not). The second, is_valid_pattern() is designed to check the
validity of a given pattern string (TRUE return if valid, FALSE if
not). By 'validity', I mean well formed or syntactically correct.
In addition, is_valid_pattern() has as one of it's parameters a
return code for determining the type of error found in the pattern if
one exists. The error codes are as follows and defined in match.h:
PATTERN_VALID - pattern is well formed
PATTERN_ESC - pattern has invalid literal escape ('\' at end of
pattern)
PATTERN_RANGE - [..] construct has a no end range in a '-' pair
(ie [a-])
PATTERN_CLOSE - [..] construct has no end bracket (ie [abc-g )
PATTERN_EMPTY - [..] construct is empty (ie [])
The functions are declared as follows:
BOOLEAN is_valid_pattern (char *pattern, int *error_type);
BOOLEAN is_pattern (char *pattern);