home *** CD-ROM | disk | FTP | other *** search
- Documentation for regexp.library
- version 1.0
- January 1991
- Stephen Moehle
- BIX: stephe
- USENET: sjm@well.sf.ca.us
-
-
-
- COPYRIGHT
-
- Copyright (c) Stephen Moehle 1991
-
- I retain my full copyright on this library. It may be
- used for any purpose and is freely distributable.
-
-
-
- OVERVIEW
-
- regexp.library provides UN*X style regular expression
- pattern matching for both general programs and ARexx programs.
- The regular expression code comes from bawk, a public domain awk
- clone. The regular expressions available are largely but not
- exactly compatible with those found in UN*X programs such as
- grep, ex, awk, etc. The following is a description of the
- regular expressions:
-
- x An ordinary character (not mentioned below)
- matches that character.
- '\' The backslash quotes any character.
- "\$" matches a dollar-sign.
- '^' A circumflex at the beginning of an expression
- matches the beginning of a line.
- '$' A dollar-sign at the end of an expression
- matches the end of a line.
- '.' A period matches any single character except
- newline.
- ':x' A colon matches a class of characters described
- by the character following it:
- ':a' ":a" matches any alphabetic;
- ':d' ":d" matches digits;
- ':n' ":n" matches alphanumerics;
- ': ' ": " matches spaces, tabs, and other control
- characters, such as newline.
- '*' An expression followed by an asterisk matches
- zero or more occurrences of that expression:
- "fo*" matches "f", "fo", "foo", "fooo", etc.
- '+' An expression followed by a plus sign matches
- one or more occurrences of that expression:
- "fo+" matches "fo", "foo", "fooo", etc.
- '-' An expression followed by a minus sign
- optionally matches the expression.
- '[]' A string enclosed in square brackets matches
- any single character in that string, but no
- others. If the first character in the string
- is a circumflex, the expression matches any
- character except newline and the characters in
- the string. For example, "[xyz]" matches "xx"
- and "zyx", while "[^xyz]" matches "abc" but not
- "axb". A range of characters may be specified
- by two characters separated by "-". Note that,
- [a-z] matches alphabetics, while [z-a] never
- matches.
-
-
-
- AREXX
-
- The ARexx part of the library consists of 5 functions
- available from any ARexx program. Before this library can be
- used, however, it must be added to ARexx's list of function
- libraries which can be done by either using the ADDLIB function
- from within ARexx or using the external rxlib program. This
- library has an offset of -30. One caveat to keep in mind when
- using these functions is that regular expressions should be no
- longer than 200 bytes. Otherwise some internal buffers might be
- blown, leading to a sure guru. The functions available are:
-
- REINDEX()
-
- Usage: REINDEX(string, pattern, [{'F' | 'L'}])
- Searches <string> for the first or last occurrence of
- the regular expression <pattern>. First or last is
- determined by whether the third argument is 'F' or 'L'.
- First is the default. The returned value is the index
- of the matched pattern, 0 if the pattern was not found,
- or -1 if pattern was an illegal regular expression.
- Examples:
-
- say reindex("abcdefg", "d.*g") ==> 4
- say reindex("abcdefg", "e.*c") ==> 0
- say reindex("abcdefg", "a[b") ==> -1
-
- REDELSTR()
-
- Usage: REDELSTR(string, pattern, [{'F' | 'L'}])
- Deletes the first or last substring of <string> that
- matches the regular expression <pattern>. First or
- last is determined by whether the third argument is 'F'
- or 'L'. First is the default. If no matching
- substring was found or <pattern> contains an illegal
- regular expression, <string> is returned unchanged.
- Examples:
-
- say redelstr("abcdefg", "b[cd]+e") ==> afg
- say redelstr("abcdefg", "bz+c") ==> abcdefg
-
- RESUBSTR()
-
- Usage: RESUBSTR(string, pattern, [{'F' | 'L'}])
- Returns the first or last substring of <string> that
- matches the regular expression pattern. First or last
- is determined by whether the third argument is 'F' or
- 'L'. First is the default. If no matching substring
- was found or <pattern> contains an illegal regular
- expression, an empty string is returned.
- Examples:
-
- say resubstr("abcdefg", "b[cd]*e") ==> bcde
- say resubstr("abcdefg", "a[b") ==>
-
- RECOMPILE()
-
- Usage: RECOMPILE(pattern)
- Compiles the regular expression <pattern> into the form
- needed by the REMATCH function. Returns the compiled
- string or an empty string if <pattern> contained an
- illegal regular expression.
- Example:
-
- patbuf = recompile("ab.*f")
-
- REMATCH()
-
- Usage: REMATCH(string, patbuf)
- Searches <string> for the compiled regular expression
- <patbuf>. Returns 0 is a match was found, -1 if not,
- and -2 if <patbuf> is an empty string or invalid. If
- multiple searches are to be made using the same regular
- expression, this function could be potentially much
- faster than REINDEX since REINDEX has to do the
- equivalent of RECOMPILE each time it is called.
- Example:
-
- patbuf = recompile("c.*f")
- say rematch("abcdefg", patbuf) ==> 0
-
-
-
-
- C LANGUAGE
-
- The regular expression routines are easily called from
- Lattice or SAS/C. All that is needed is to OpenLibrary
- regexp.library and assign the result to RegExpBase. The header
- file regexp.h contains the prototypes and the pragmas for
- performing inline calls for the two provided functions and also
- an extern declaration for RegExpBase. I have made RegExpBase a
- void *, but you can easily make it a struct Library *. I have
- included the fd file so that interfaces for other compilers and
- languages can be created. The functions provided are:
-
- NAME
- RegExpCompile
-
- SYNOPSIS
- #include "regexp.h"
-
- success = RegExpCompile(pattern, patbuf)
-
- int success; success or error code
- char *pattern; regular expression to be compiled
- char *patbuf; buffer in which compiled expression is
- to be place
-
- DESCRIPTION
- Compiles the regular expression <pattern> and places the
- result in <patbuf>. It is the programmer's responsibility
- to ensure that <patbuf> is large enough to hold the compiled
- expression. No checking is done for overflow of the buffer.
-
- RETURNS
- Returns 0 if successful or a negative error code. See the
- enum errs in regexp.h for valid error codes.
-
-
- NAME
- RegExpMatch
-
- SYNOPSIS
- #include "regexp.h"
-
- success = RegExpMatch(text, patbuf, index, length, flags)
-
- int success;
- char *text; text to search
- char *patbuf; a compiled regular expression
- int *index; set to offset in <text> of match
- int *length; set to length of match in <text>
- int flags; FIRST_MATCH or LAST_MATCH
-
- DESCRIPTION
- Searches <text> for a match of the compiled regular
- expression <patbuf>. RegExpCompile() must be used to create
- <patbuf>. If a match is found, <index> is set to the 0
- based offset into <text> of the beginning of the match, and
- <length> is set to the length of the span of matching
- characters. The parameter <flags> controls whether the
- first of last match in <text> is taken. The values
- FIRST_MATCH and LAST_MATCH are defined in regexp.h.
-
- RETURNS
- Returns 0 if successful, -1 if no match found, and -2 if
- <patbuf> contains an invalid compiled regular expression.
-