Next | Prev | Up | Top | Contents | Index

Using Regular Expressions

Regular expression are used widely throughout the services and are powerful mechanisms for locating and manipulating patterns in text. In order to be compatible with a variety of historic UNIX systems, the IRIX Developer's Option includes the unique regular expression library sets listed in Table 6-6. Note that only the last, wsregexp, supports internationalization.

Regular Expression Libraries in IRIX
Library DocumentationType of Support Provided
regcmp(3G) Function regcmp() compiles a pattern string; regex() applies the pattern to a target string. Syntax is said to be that of ed but "syntax and semantics have been changed slightly" in unspecified ways.
regcmp(1) Command applies regcmp() against a file of pattern strings, generating C code for literal strings that can be included in a source program so as to preclude having to compile patterns at run-time.
REGEX(3) Function re_comp() compiles a pattern string; re_exec() applies the last-compiled pattern against a target string. No means of storing compiled patterns. No documentation of supported syntax, but cross-references ed(1), with which it may or may not be compatible.
regexp(5) Function compile() compiles a pattern string; step() or advance() applies a stored pattern against a target string. Unusual interface compiles these functions directly into your source module, using macro functions you must define. Pattern syntax clearly documented.
wsregexp(3W) Function wsrecompile() compiles a pattern string; wsrestep() or wsrematch() applies a pattern against a target. Both pattern and target strings are wide characters. Expression syntax is that of regexp augmented with internationalization expressions.


Internationalized Regular Expressions

A few utilities distributed with IRIX, in particular grep (see the grep(1) reference page) support internationalized regular expressions, which provide additional syntax for matching character classes, sequences, or ranges. The internationalized regular expressions supported by the wsregexp library are as shown in Table 6-7.

Character Expressions in Internationalized Regular Expressions
ExpressionDescription
c The single character c where c is not a special character.
[[:class:]]A character class expression. Any character of type class, as defined by category LC_CTYPE in the program's locale (for example, see isalpha()). For class, substitute one of the following:

alpha, a letter

upper, an upper-case letter

lower, a lower-case letter

digit, a decimal digit

xdigit, a hexadecimal digit

alnum, an alphanumeric (letter or digit)

space, a character that produces white space in displayed text

punct, a punctuation character

print, a printing character

graph, a character with a visible representation

cntrl, a control character

[[=c=]]An equivalence class. Any collation element defined as having the same relative order in the current collation sequence as c. As an example, if A and a belong to the same equivalence class, then both [[=A=]b] and [[=a=]b] are equivalent to [Aab].
[[.cc.]]A collating symbol. Multi-character collating elements must be represented as collating symbols to distinguish them from single-character collating elements. As an example, if the string ch is a valid collating element, then [[.ch.]] is treated as an element matching the same string of characters, while ch is treated as a simple list of c and h. If the string is not a valid collating element in the current collating sequence definition, the symbol is treated as an invalid expression.
[c-c]Any collation element in the character expression range c-c, where c can identify a collating symbol or an equivalence class. If the hyphen character, -, appears immediately after an opening square bracket, or immediately prior to a closing square bracket, it has no special meaning.

Within square brackets, a period (.) that is not part of a [[.c.]] sequence, a colon (:) that is not part of a [[:class:]] sequence, and an equals sign (=) that is not part of a [[=c=]] sequence matches itself.

Table 6-8 shows examples of simple regular expressions.

Examples of Internationalized Regular Expressions
PatternDefinition
[[=a=]]bcdany form of a followed by bcd
[[.ch.]-e]any element that collates between ch and e
[[:lower:]]any lower case letter


Next | Prev | Up | Top | Contents | Index