home *** CD-ROM | disk | FTP | other *** search
- '\"
- '\" Copyright (c) 1993 The Regents of the University of California.
- '\" All rights reserved.
- '\"
- '\" Permission is hereby granted, without written agreement and without
- '\" license or royalty fees, to use, copy, modify, and distribute this
- '\" documentation for any purpose, provided that the above copyright
- '\" notice and the following two paragraphs appear in all copies.
- '\"
- '\" IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY
- '\" FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES
- '\" ARISING OUT OF THE USE OF THIS DOCUMENTATION, EVEN IF THE UNIVERSITY OF
- '\" CALIFORNIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- '\"
- '\" THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES,
- '\" INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
- '\" AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS
- '\" ON AN "AS IS" BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATION TO
- '\" PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
- '\"
- '\" $Header: /user6/ouster/tcl/man/RCS/regexp.n,v 1.2 93/06/17 13:31:37 ouster Exp $ SPRITE (Berkeley)
- '\"
- .so man.macros
- .HS regexp tcl
- .BS
- '\" Note: do not modify the .SH NAME line immediately below!
- .SH NAME
- regexp \- Match a regular expression against a string
- .SH SYNOPSIS
- \fBregexp \fR?\fIswitches\fR? \fIexp string \fR?\fImatchVar\fR? ?\fIsubMatchVar subMatchVar ...\fR?
- .BE
-
- .SH DESCRIPTION
- .PP
- Determines whether the regular expression \fIexp\fR matches part or
- all of \fIstring\fR and returns 1 if it does, 0 if it doesn't.
- .LP
- If additional arguments are specified after \fIstring\fR then they
- are treated as the names of variables in which to return
- information about which part(s) of \fIstring\fR matched \fIexp\fR.
- \fIMatchVar\fR will be set to the range of \fIstring\fR that
- matched all of \fIexp\fR. The first \fIsubMatchVar\fR will contain
- the characters in \fIstring\fR that matched the leftmost parenthesized
- subexpression within \fIexp\fR, the next \fIsubMatchVar\fR will
- contain the characters that matched the next parenthesized
- subexpression to the right in \fIexp\fR, and so on.
- .LP
- If the initial arguments to \fBregexp\fR start with \fB\-\fR then
- .VS
- they are treated as switches. The following switches are
- currently supported:
- .TP 10
- \fB\-nocase\fR
- Causes upper-case characters in \fIstring\fR to be treated as
- lower case during the matching process.
- .TP 10
- \fB\-indices\fR
- Changes what is stored in the \fIsubMatchVar\fRs.
- Instead of storing the matching characters from \fBstring\fR,
- each variable
- will contain a list of two decimal strings giving the indices
- in \fIstring\fR of the first and last characters in the matching
- range of characters.
- .TP 10
- \fB\-\|\-\fR
- Marks the end of switches. The argument following this one will
- be treated as \fIexp\fR even if it starts with a \fB\-.
- .VE
- .LP
- If there are more \fIsubMatchVar\fR's than parenthesized
- subexpressions within \fIexp\fR, or if a particular subexpression
- in \fIexp\fR doesn't match the string (e.g. because it was in a
- portion of the expression that wasn't matched), then the corresponding
- \fIsubMatchVar\fR will be set to ``\fB\-1 \-1\fR'' if \fB\-indices\fR
- has been specified or to an empty string otherwise.
-
- .SH "REGULAR EXPRESSIONS"
- .PP
- Regular expressions are implemented using Henry Spencer's package
- (thanks, Henry!),
- and the description of regular expressions below is copied verbatim
- from his manual entry.
- .PP
- A regular expression is zero or more \fIbranches\fR, separated by ``|''.
- It matches anything that matches one of the branches.
- .PP
- A branch is zero or more \fIpieces\fR, concatenated.
- It matches a match for the first, followed by a match for the second, etc.
- .PP
- A piece is an \fIatom\fR possibly followed by ``*'', ``+'', or ``?''.
- An atom followed by ``*'' matches a sequence of 0 or more matches of the atom.
- An atom followed by ``+'' matches a sequence of 1 or more matches of the atom.
- An atom followed by ``?'' matches a match of the atom, or the null string.
- .PP
- An atom is a regular expression in parentheses (matching a match for the
- regular expression), a \fIrange\fR (see below), ``.''
- (matching any single character), ``^'' (matching the null string at the
- beginning of the input string), ``$'' (matching the null string at the
- end of the input string), a ``\e'' followed by a single character (matching
- that character), or a single character with no other significance
- (matching that character).
- .PP
- A \fIrange\fR is a sequence of characters enclosed in ``[]''.
- It normally matches any single character from the sequence.
- If the sequence begins with ``^'',
- it matches any single character \fInot\fR from the rest of the sequence.
- If two characters in the sequence are separated by ``\-'', this is shorthand
- for the full list of ASCII characters between them
- (e.g. ``[0-9]'' matches any decimal digit).
- To include a literal ``]'' in the sequence, make it the first character
- (following a possible ``^'').
- To include a literal ``\-'', make it the first or last character.
- .PP
- If a regular expression could match two different parts of a string,
- it will match the one which begins earliest.
- If both begin in the same place but match different lengths, or match
- the same length in different ways, life gets messier, as follows.
- .PP
- In general, the possibilities in a list of branches are considered in
- left-to-right order, the possibilities for ``*'', ``+'', and ``?'' are
- considered longest-first, nested constructs are considered from the
- outermost in, and concatenated constructs are considered leftmost-first.
- The match that will be chosen is the one that uses the earliest
- possibility in the first choice that has to be made.
- If there is more than one choice, the next will be made in the same manner
- (earliest possibility) subject to the decision on the first choice.
- And so forth.
- .PP
- For example, ``(ab|a)b*c'' could match ``abc'' in one of two ways.
- The first choice is between ``ab'' and ``a''; since ``ab'' is earlier, and does
- lead to a successful overall match, it is chosen.
- Since the ``b'' is already spoken for,
- the ``b*'' must match its last possibility\(emthe empty string\(emsince
- it must respect the earlier choice.
- .PP
- In the particular case where no ``|''s are present and there is only one
- ``*'', ``+'', or ``?'', the net effect is that the longest possible
- match will be chosen.
- So ``ab*'', presented with ``xabbbby'', will match ``abbbb''.
- Note that if ``ab*'' is tried against ``xabyabbbz'', it
- will match ``ab'' just after ``x'', due to the begins-earliest rule.
- (In effect, the decision on where to start the match is the first choice
- to be made, hence subsequent choices must respect it even if this leads them
- to less-preferred alternatives.)
-
- .SH KEYWORDS
- match, regular expression, string
-