or
char *strsed(string,
command,
range)
char *string;
char *command;
int range[2];
Strsed can be used to provide the functionality of most of the other more "complicated" string functions (e.g. strchr, strrchr, strpbrk, strspn, strcspn, and strtok), although less efficiently in each case, due to its generality. Strsed is a very powerful and general function that can be used to carry out complicated string manipulations such as those that are possible in text editors.
Both string and command may contain the following C-like escape sequences:
\b Backspace. \f Formfeed. \n Newline. \r Carriage Return. \s Space. \t Horizontal Tab. \v Vertical Tab. \z Used to remove ambiguity if necessary. \0-9 A reference to a register. (except for \0 in a regular expression.) \0x3d The character whose value is 3d hexadecimal. \0X3d The character whose value is 3d hexadecimal. \040 The character whose value is 40 octal. \32 The character whose value is 32 decimal.
The NUL (0) character cannot be specified. A ``\'' followed by one to three digits can be interpreted in several ways. If one or two hex digits are preceeded by an ``x'' or an ``X'', they will be taken as specifying a character in hexadecimal. If there are exactly three octal digits and the first is in the range ``0'' to ``3'' then they are taken as specifying a character in octal. Otherwise a single digit is taken to be a register reference and two or three digits are interpreted as specifying a character in decimal. \z can be used to avoid problems with ambiguity. For instance, \007 will be interpreted by strsed as octal 007. To specify the contents of register zero (\0) followed by the two characters ``07'', use \0\z07. The \z makes it clear what is meant (acting like a punctuation mark) and is otherwise ignored.
Strsed allows ed(1) like regular expressions and substitutions on string. The search and replace command is specified by command. The format of command is either
/search_pattern/replacement/
or
g/search_pattern/replacement/
In the first form, the search and replace is performed once on the string, and in the second, the replacement is done globally (i.e. for every occurrence of the search pattern in string.). A leading ``s'' in the above is silently ignored. This allows for a syntax more like that of ed(1). e.g. s/e/x/ is the same as /e/x/.
If replacement is empty, then the matched text will be replaced by nothing - i.e. deleted.
Search_pattern is a full regular expression (see ed(1)), including register specifications (i.e. \( ... \)) and register references, (e.g. \2) but not the {m,n} repetition feature of ed(1).
Replacement consists of ordinary characters and/or register references (e.g. \1 or \2). \0 means the entire matched text. In addition, a register reference may be immediately followed by a transliteration request, of the form
{char-list-1}{char-list-2}.
The characters from char-list-1 will be transliterated into the corresponding ones from char-list-2 in the same manner as tr(1). If the register reference before a transliteration request is omitted, it defaults to \0. Within a transliteration request, the characters "}" and "-" are metacharacters and must be escaped with a leading \ if you want them to be interpreted literally. Character ranges such as a-z are expanded in the same fashion as tr(1). If char-list-2 is shorter than char-list-1 then char-list-2 is padded to be the same length as char-list-1 by repeating its last character as many times as are needed. For example, the transliteration request
{a-z}{X}
will transliterate all lower case letters into an 'X'. Character ranges may be increasing or decreasing.
Unusual character ranges (such as a-f-0-\0x2d-c) are interpreted as running from their first character to their last (so the above would be treated as a-c). Note that it is not possible (in this release) to specify the complement of a character range in a transliteration request. However, this can be done in the search_pattern by commencing a character class with a "^" in the normal regular expression fashion.
The highest register that can be referenced is \9.
/a/A/ # Change the first 'a' into an 'A' g/a/A/ # Change every 'a' into an 'A' g/:// # Delete every ':' g/jack/jill/ # Change every 'jack' to a 'jill' /[^\s\t]/X/ # Change the first non-whitespace # character into an 'X'.
Some more advanced examples...
/\([\s\t]*\)\([^\s\t]*\)/\1\2{a-z}{A-Z}/
This converts the first non-whitespace word to upper case, preserving any initial whitespace. It catches the first run of spaces and TABs into register one \([\s\t]*\), and then the following run of non-white characters into register two \([^\s\t]*\). The replacement, \1\2{a-z}{A-Z} specifies register 1 (the whitespace) followed by the contents of register 2 transliterated into uppercase. This would produce
" SPOTTED pinto bean"
if called on the string
" spotted pinto bean".
g/\([a-z]\)\1+/\1/
This is a very useful example and performs the same function as tr -s. That is, it squeezes runs of identical characters (in the range a to z) down to a single instance of that character. So "beeee good" becomes "be god". The "+" is the regular expression notation meaning "one or more".
g/\([\t\s]*\)\(.\)\([^\t\s]*\)/\1\2{a-z}{A-Z}\3/
This example capitalises the first letter of each word in the string, and preserves all whitespace. It catches three things,
1) the initial whitespace \([\t\s]*\) in register 1 2) the next letter \(.\) in register 2 3) the following nonwhite letters \([^\t\s]*\) in register 3
and then prints them out as they were found, with the only difference being the uppercase conversion of the contents of register 2. Given the string
" this is a line "
this command would return
" This Is A Line ".
If the initial 'g' was not present in the command, then the capitalisation would only be done to the first word in the string. It is important to understand this difference well.
strsed("two big macs please", "/b.*c/", range);
range[0] will contain 4 and range[1] will contain 11. If not match is found, both elements of range will contain -1.
terry@distel.pcs.com
or ...!{pyramid,unido}!pcsbst!distel!terry