The World of Computer Software

home *** CD-ROM | disk | FTP | other *** search

/ The World of Computer Software / World_Of_Computer_Software-02-385-Vol-1of3.iso / m / me_cd25.zip / DOC.ZIP / REGEXP.DOC < prev next >

Wrap

Text File | 1992-11-09 | 5KB | 115 lines

Regular Expresions ------- ---------- Regular expression syntax. [1] char Matches itself, unless it is a special character (meta-character): . [ ] * + ^ $ If case-fold-search is TRUE, char will match both upper and lower case. [2] . Matches any character. [3] \ Matches the character following it, except when followed by one of: ()1234567890<> adnwW (See [7] - [15]) It is used as an escape character for all other meta-characters, and itself. When used in a set ([4]), it is treated as an ordinary character. [4] [set] Matches one of the characters in the set. If the first character in the set is ^, it matches a character NOT in the set. A shorthand S-E is used to specify a set of characters S up to E, inclusive. Note that case-fold-search has no affect on sets. To include - in a set: [-...], [^-...] or [...-] To include ] in a set: []...], [^]...] or [^]-...] Example Matches ------- ------- [a-z] Any lowercase alpha [^]-] Any char except ] and - [^A-Z] Any char except uppercase alpha [a-zA-Z0-9] Any alphanumeric [a-b-c] == [a-bb-c] == [a-c] Matches a or b or c. [a-a] == [a] == a (ignoring case-fold-search). [-abc] == [abc-] Match a or b or c or -. []] == ] Match ]. [-]] Match ONLY -]. This is a set ([-]) and a character (]). [z-a] Error, an empty range. [5] * Any regular expression form [1] to [4], followed by closure char (*) matches zero or more matches of that form. [6] + Same as [5], except it matches one or more. [7] $ A regular expression in the form [1] to [10], enclosed as \(form$ matches what form matches. The enclosure creates a set of tags, used for [8] and for pattern substitution. The tagged forms are numbered starting from 1. [8] \1 ... \9 A \ followed by a digit 1 to 9 matches whatever a previously tagged regular expression ([7]) matched. [9] \< Matches the beginning of a word. \> Matches the end of a word. See (modify-syntax-entry) for what a word is. [10] \a Matches an alpha character (same as [a-zA-Z]). [11] \d [0-9] [12] \n Matches an alphanumeric character: [a-zA-Z0-9] [13] \<blank> Matches whitespace. [14] \w Matches a word character (as defined by the syntax tables). [15] \W Matches a non-word character (as defined by the syntax tables). [16] A composite regular expression xy where x and y are in the form of [1] to [10] matches the longest match of x followed by a match for y. [17] ^ $ a regular expression starting with a ^ character and/or ending with a $ character, restricts the pattern matching to the beginning of the line, and/or the end of line anchors. Elsewhere in the pattern, ^ and $ are treated as ordinary characters. RE Substitutions -- ------------- In the replace string, the following characters have special meaning: [1] & Substitute the entire matched string in the destination. [2] \n Substitute the substring matched by a tagged subpattern numbered n, where n is between 1 to 9, inclusive. [3] \char Treat the next character literally, unless the character is a digit ([2]). Otherwise the text is inserted verbatim. EXAMPLES foo*.* matches: fo foo fooo foobar fobar foxx ... fo[ob]a[rz] matches: fobar fooar fobaz fooaz foo\\+ matches: foo\ foo\\ foo\\\ ... $foo$[1-3]\1 (same as foo[1-3]foo, but takes less internal space) matches: foo1foo foo2foo foo3foo $fo.*$-\1 matches: foo-foo fo-fo fob-fob foobar-foobar ... DIAGNOSTICS No previous regular expression, Empty closure, Illegal closure, Cyclical reference, Undetermined reference, Unmatched (, Missing ], Null pattern inside , Null pattern inside \<\>, Too many  pairs, Unmatched \). AUTHOR: Ozan S. Yigit (oz) TWEAKER: Craig Durland BUGS The internal storage for the compiled regular expression is not checked for overflows. Currently, it is 512 bytes. If your RE's are not much longer than 80 characters, you will not have any problems. A pattern will not cross lines. If a line of the buffer is very long, part of it might be ignored. [8] only works if the referenced tagged RE is made of constants and case matters no matter what case-fold-search is set to. ie no RE's here. Yes, pretty worthless and should be fixed. Others, no doubt.