home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
OS/2 Shareware BBS: 22 gnu
/
22-gnu.zip
/
GNUTUE.ZIP
/
man
/
tr.man
< prev
next >
Wrap
Text File
|
1992-07-18
|
9KB
|
224 lines
NAME
tr - translate or delete characters
SYNOPSIS
tr [-cst] [--complement] [--squeeze-repeats] [--truncate-set1] string1
string2
tr {-s,--squeeze-repeats} [-c] [--complement] string1
tr {-d,--delete} [-c] string1
tr {-d,--delete} {-s,--squeeze-repeats} [-c] [--complement] string1
string2
DESCRIPTION
This manual page documents the GNU version of tr. tr copies the standard
input to the standard output, performing one of the following operations:
o translate, and optionally squeeze repeated characters in the
result
o squeeze repeated characters
o delete characters
o delete characters, then squeeze repeated characters from the
result.
The string1 and (if given) string2 arguments define ordered sets of
characters, referred to below as set1 and set2. These sets are the
characters of the input that tr operates on. The --complement (-c)
option replaces set1 with its complement (all of the characters that are
not in set1).
SPECIFYING SETS OF CHARACTERS
The format of the string1 and string2 arguments resembles the format of
regular expressions; however, they are not regular expressions, only
lists of characters. Most characters simply represent themselves in
these strings, but the strings can contain the shorthands listed below,
for convenience. Some of them can be used only in string1 or string2, as
noted below.
Backslash excapes. A backslash followed by a character not listed below
causes an error message.
\a Control-G.
\b Control-H.
\f Control-L.
\n Control-J.
\r Control-M.
\t Control-I.
\v Control-K.
\ooo The character with the value given by ooo, which is 1 to 3 octal
digits.
\\ A backslash.
Ranges. The notation `m-n' expands to all of the characters from m
through n, in ascending order. m should collate before n; if it doesn't,
an error results. As an example, `0-9' is the same as `0123456789'.
Ranges can optionally be enclosed in square brackets, which has no effect
but is supported for compatibility with historical System V versions of
tr.
Repeated characters. The notation `[c*n]' in string2 expands to n copies
of character c. Thus, `[y*6]' is the same as `yyyyyy'. The notation
`[c*]' in string2 expands to as many copies of c as are needed to make
set2 as long as set1. If n begins with a 0, it is interpreted in octal,
otherwise in decimal.
Character classes. The notation `[:class-name:]' expands to all of the
characters in the (predefined) class named class-name. The characters
expand in no particular order, except for the `upper' and `lower'
classes, which expand in ascending order. When the --delete (-d) and
--squeeze-repeats (-s) options are both given, any character class can be
used in string2. Otherwise, only the character classes `lower' and
`upper' are accepted in string2, and then only if the corresponding
character class (`upper' and `lower', respectively) is specified in the
same relative position in string1. Doing this specifies case conversion.
The class names are given below; an error results when an invalid class
name is given.
alnum
Letters and digits.
alpha
Letters.
blank
Horizontal whitespace.
cntrl
Control characters.
digit
Digits.
graph
Printable characters, not including space.
lower
Lowercase letters.
print
Printable characters, including space.
punct
Punctuation characters.
space
Horizontal or vertical whitespace.
upper
Uppercase letters.
xdigit
Hexadecimal digits.
Equivalence classes. The syntax `[=c=]' expands to all of the characters
that are equivalent to c, in no particular order. Equivalence classes
are a recent invention intended to support non-English alphabets. But
there seems to be no standard way to define them or determine their
contents. Therefore, they are not fully implemented in GNU tr; each
character's equivalence class consists only of that character, which
makes this a useless construction currently.
TRANSLATING
tr performs translation when string1 and string2 are both given and the
--delete (-d) option is not given. tr translates each character of its
input that is in set1 to the corresponding character in set2. Characters
not in set1 are passed through unchanged. When a character appears more
than once in set1 and the corresponding characters in set2 are not all
the same, only the final one is used. For example, these two commands
are equivalent:
tr aaa xyz
tr a z
A common use of tr is to convert lowercase characters to uppercase. This
can be done in many ways. Here are three of them:
tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
tr a-z A-Z
tr '[:lower:]' '[:upper:]'
When tr is performing translation, set1 and set2 should normally have the
same length. If set1 is shorter than set2, the extra characters at the
end of set2 are ignored.
On the other hand, making set1 longer than set2 is not portable; POSIX.2
says that the result is undefined. In this situation, the BSD tr pads
set2 to the length of set1 by repeating the last character of set2 as
many times as necessary. The System V tr truncates set1 to the length of
set2.
By default, GNU tr handles this case like the BSD tr does. When the
--truncate-set1 (-t) option is given, GNU tr handles this case like the
System V tr instead. This option is ignored for operations other than
translation.
Acting like the System V tr in this case breaks the relatively common BSD
idiom:
tr -cs A-Za-z0-9 '\012'
because it converts only zero bytes (the first element in the complement
of set1), rather than all non-alphanumerics, to newlines.
SQUEEZING REPEATS AND DELETING
When given just the --delete (-d) option, tr removes any input characters
that are in set1.
When given just the --squeeze-repeats (-s) option, tr replaces each input
sequence of a repeated character that is in set1 with a single occurrence
of that character.
When given both the --delete and the --squeeze-repeats options, tr first
performs any deletions using set1, then squeezes repeats from any
remaining characters using set2.
The --squeeze-repeats option may also be used when translating, in which
case tr first peforms translation, then squeezes repeats from any
remaining characters using set2.
Here are some examples to illustrate various combinations of options:
Remove all zero bytes:
tr -d '\000'
Put all words on lines by themselves. This converts all non-alphanumeric
characters to newlines, then squeezes each string of repeated newlines
into a single newline:
tr -cs '[a-zA-Z0-9]' '[\n*]'
Convert each sequence of repeated newlines to a single newline:
tr -s '\n'
WARNING MESSAGES
Setting the environment variable POSIXLY_CORRECT turns off several
warning and error messages, for strict compliance with POSIX.2. The
messages normally occur in the following circumstances:
1. When the --delete option is given but --squeeze-repeats is not, and
string2 is given, GNU tr by default prints a usage message and exits,
because string2 would not be used. The POSIX specification says that
string2 must be ignored in this case. Silently ignoring arguments is a
bad idea.
2. When an ambiguous octal escape is given. For example, \400 is
actually \40 followed by the digit 0, because the value 400 octal does
not fit into a single byte.
Note that GNU tr does not provide complete BSD or System V compatibility.
For example, there is no option to disable interpretation of the POSIX
constructs [:alpha:], [=c=], and [c*10]. Also, GNU tr does not delete
zero bytes automatically, unlike traditional UNIX versions, which provide
no way to preserve zero bytes.
The long-named options can be introduced with `+' as well as `--', for
compatibility with previous releases. Eventually support for `+' will be
removed, because it is incompatible with the POSIX.2 standard.