home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Fish 'n' More 2
/
fishmore-publicdomainlibraryvol.ii1991xetec.iso
/
dirs
/
regexplib_444.lzh
/
RegExpLib
/
regexp.doc
< prev
next >
Wrap
Text File
|
1991-01-24
|
8KB
|
216 lines
Documentation for regexp.library
version 1.0
January 1991
Stephen Moehle
BIX: stephe
USENET: sjm@well.sf.ca.us
COPYRIGHT
Copyright (c) Stephen Moehle 1991
I retain my full copyright on this library. It may be
used for any purpose and is freely distributable.
OVERVIEW
regexp.library provides UN*X style regular expression
pattern matching for both general programs and ARexx programs.
The regular expression code comes from bawk, a public domain awk
clone. The regular expressions available are largely but not
exactly compatible with those found in UN*X programs such as
grep, ex, awk, etc. The following is a description of the
regular expressions:
x An ordinary character (not mentioned below)
matches that character.
'\' The backslash quotes any character.
"\$" matches a dollar-sign.
'^' A circumflex at the beginning of an expression
matches the beginning of a line.
'$' A dollar-sign at the end of an expression
matches the end of a line.
'.' A period matches any single character except
newline.
':x' A colon matches a class of characters described
by the character following it:
':a' ":a" matches any alphabetic;
':d' ":d" matches digits;
':n' ":n" matches alphanumerics;
': ' ": " matches spaces, tabs, and other control
characters, such as newline.
'*' An expression followed by an asterisk matches
zero or more occurrences of that expression:
"fo*" matches "f", "fo", "foo", "fooo", etc.
'+' An expression followed by a plus sign matches
one or more occurrences of that expression:
"fo+" matches "fo", "foo", "fooo", etc.
'-' An expression followed by a minus sign
optionally matches the expression.
'[]' A string enclosed in square brackets matches
any single character in that string, but no
others. If the first character in the string
is a circumflex, the expression matches any
character except newline and the characters in
the string. For example, "[xyz]" matches "xx"
and "zyx", while "[^xyz]" matches "abc" but not
"axb". A range of characters may be specified
by two characters separated by "-". Note that,
[a-z] matches alphabetics, while [z-a] never
matches.
AREXX
The ARexx part of the library consists of 5 functions
available from any ARexx program. Before this library can be
used, however, it must be added to ARexx's list of function
libraries which can be done by either using the ADDLIB function
from within ARexx or using the external rxlib program. This
library has an offset of -30. One caveat to keep in mind when
using these functions is that regular expressions should be no
longer than 200 bytes. Otherwise some internal buffers might be
blown, leading to a sure guru. The functions available are:
REINDEX()
Usage: REINDEX(string, pattern, [{'F' | 'L'}])
Searches <string> for the first or last occurrence of
the regular expression <pattern>. First or last is
determined by whether the third argument is 'F' or 'L'.
First is the default. The returned value is the index
of the matched pattern, 0 if the pattern was not found,
or -1 if pattern was an illegal regular expression.
Examples:
say reindex("abcdefg", "d.*g") ==> 4
say reindex("abcdefg", "e.*c") ==> 0
say reindex("abcdefg", "a[b") ==> -1
REDELSTR()
Usage: REDELSTR(string, pattern, [{'F' | 'L'}])
Deletes the first or last substring of <string> that
matches the regular expression <pattern>. First or
last is determined by whether the third argument is 'F'
or 'L'. First is the default. If no matching
substring was found or <pattern> contains an illegal
regular expression, <string> is returned unchanged.
Examples:
say redelstr("abcdefg", "b[cd]+e") ==> afg
say redelstr("abcdefg", "bz+c") ==> abcdefg
RESUBSTR()
Usage: RESUBSTR(string, pattern, [{'F' | 'L'}])
Returns the first or last substring of <string> that
matches the regular expression pattern. First or last
is determined by whether the third argument is 'F' or
'L'. First is the default. If no matching substring
was found or <pattern> contains an illegal regular
expression, an empty string is returned.
Examples:
say resubstr("abcdefg", "b[cd]*e") ==> bcde
say resubstr("abcdefg", "a[b") ==>
RECOMPILE()
Usage: RECOMPILE(pattern)
Compiles the regular expression <pattern> into the form
needed by the REMATCH function. Returns the compiled
string or an empty string if <pattern> contained an
illegal regular expression.
Example:
patbuf = recompile("ab.*f")
REMATCH()
Usage: REMATCH(string, patbuf)
Searches <string> for the compiled regular expression
<patbuf>. Returns 0 is a match was found, -1 if not,
and -2 if <patbuf> is an empty string or invalid. If
multiple searches are to be made using the same regular
expression, this function could be potentially much
faster than REINDEX since REINDEX has to do the
equivalent of RECOMPILE each time it is called.
Example:
patbuf = recompile("c.*f")
say rematch("abcdefg", patbuf) ==> 0
C LANGUAGE
The regular expression routines are easily called from
Lattice or SAS/C. All that is needed is to OpenLibrary
regexp.library and assign the result to RegExpBase. The header
file regexp.h contains the prototypes and the pragmas for
performing inline calls for the two provided functions and also
an extern declaration for RegExpBase. I have made RegExpBase a
void *, but you can easily make it a struct Library *. I have
included the fd file so that interfaces for other compilers and
languages can be created. The functions provided are:
NAME
RegExpCompile
SYNOPSIS
#include "regexp.h"
success = RegExpCompile(pattern, patbuf)
int success; success or error code
char *pattern; regular expression to be compiled
char *patbuf; buffer in which compiled expression is
to be place
DESCRIPTION
Compiles the regular expression <pattern> and places the
result in <patbuf>. It is the programmer's responsibility
to ensure that <patbuf> is large enough to hold the compiled
expression. No checking is done for overflow of the buffer.
RETURNS
Returns 0 if successful or a negative error code. See the
enum errs in regexp.h for valid error codes.
NAME
RegExpMatch
SYNOPSIS
#include "regexp.h"
success = RegExpMatch(text, patbuf, index, length, flags)
int success;
char *text; text to search
char *patbuf; a compiled regular expression
int *index; set to offset in <text> of match
int *length; set to length of match in <text>
int flags; FIRST_MATCH or LAST_MATCH
DESCRIPTION
Searches <text> for a match of the compiled regular
expression <patbuf>. RegExpCompile() must be used to create
<patbuf>. If a match is found, <index> is set to the 0
based offset into <text> of the beginning of the match, and
<length> is set to the length of the span of matching
characters. The parameter <flags> controls whether the
first of last match in <text> is taken. The values
FIRST_MATCH and LAST_MATCH are defined in regexp.h.
RETURNS
Returns 0 if successful, -1 if no match found, and -2 if
<patbuf> contains an invalid compiled regular expression.