home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Frozen Fish 1: Amiga
/
FrozenFish-Apr94.iso
/
bbs
/
gnu
/
textutils-1.9-bin.lha
/
man
/
cat1
/
tr.0
< prev
next >
Wrap
Text File
|
1993-12-07
|
11KB
|
331 lines
TR(1L) TR(1L)
NNAAMMEE
tr - translate or delete characters
SSYYNNOOPPSSIISS
ttrr [-cst] [--complement] [--squeeze-repeats] [--trun-
cate-set1] string1 string2
ttrr {-s,--squeeze-repeats} [-c] [--complement] string1
ttrr {-d,--delete} [-c] string1
ttrr {-d,--delete} {-s,--squeeze-repeats} [-c] [--comple-
ment] string1 string2
GNU ttrr also accepts the --help and --version options.
DDEESSCCRRIIPPTTIIOONN
This manual page documents the GNU version of ttrr.. ttrr
copies the standard input to the standard output, perform-
ing one of the following operations:
+o translate, and optionally squeeze repeated char-
acters in the result
+o squeeze repeated characters
+o delete characters
+o delete characters, then squeeze repeated charac-
ters from the result.
The _s_t_r_i_n_g_1 and (if given) _s_t_r_i_n_g_2 arguments define
ordered sets of characters, referred to below as set1 and
set2. These sets are the characters of the input that ttrr
operates on. The _-_-_c_o_m_p_l_e_m_e_n_t (_-_c) option replaces set1
with its complement (all of the characters that are not in
set1).
SSPPEECCIIFFYYIINNGG SSEETTSS OOFF CCHHAARRAACCTTEERRSS
The format of the _s_t_r_i_n_g_1 and _s_t_r_i_n_g_2 arguments resembles
the format of regular expressions; however, they are not
regular expressions, only lists of characters. Most char-
acters simply represent themselves in these strings, but
the strings can contain the shorthands listed below, for
convenience. Some of them can be used only in _s_t_r_i_n_g_1 or
_s_t_r_i_n_g_2, as noted below.
Backslash escapes. A backslash followed by a character
not listed below causes an error message.
\a Control-G.
\b Control-H.
\f Control-L.
\n Control-J.
\r Control-M.
FSF GNU Text Utilities 1
TR(1L) TR(1L)
\t Control-I.
\v Control-K.
\ooo The character with the value given by _o_o_o, which is
1 to 3 octal digits.
\\ A backslash.
Ranges. The notation `_m-_n' expands to all of the charac-
ters from _m through _n, in ascending order. _m should col-
late before _n; if it doesn't, an error results. As an
example, `0-9' is the same as `0123456789'. Although GNU
ttrr does not support the System V syntax that uses square
brackets to enclose ranges, translations specified in that
format will still work as long as the brackets in string1
correspond to identical brackets in string2.
Repeated characters. The notation `[_c*_n]' in _s_t_r_i_n_g_2
expands to _n copies of character _c. Thus, `[y*6]' is the
same as `yyyyyy'. The notation `[_c*]' in _s_t_r_i_n_g_2 expands
to as many copies of _c as are needed to make set2 as long
as set1. If _n begins with a 0, it is interpreted in
octal, otherwise in decimal.
Character classes. The notation `[:_c_l_a_s_s_-_n_a_m_e:]' expands
to all of the characters in the (predefined) class named
_c_l_a_s_s_-_n_a_m_e. The characters expand in no particular order,
except for the `upper' and `lower' classes, which expand
in ascending order. When the _-_-_d_e_l_e_t_e (_-_d) and
_-_-_s_q_u_e_e_z_e_-_r_e_p_e_a_t_s (_-_s) options are both given, any charac-
ter class can be used in _s_t_r_i_n_g_2. Otherwise, only the
character classes `lower' and `upper' are accepted in
_s_t_r_i_n_g_2, and then only if the corresponding character
class (`upper' and `lower', respectively) is specified in
the same relative position in _s_t_r_i_n_g_1. Doing this speci-
fies case conversion. The class names are given below; an
error results when an invalid class name is given.
alnum Letters and digits.
alpha Letters.
blank Horizontal whitespace.
cntrl Control characters.
digit Digits.
graph Printable characters, not including space.
lower Lowercase letters.
print Printable characters, including space.
FSF GNU Text Utilities 2
TR(1L) TR(1L)
punct Punctuation characters.
space Horizontal or vertical whitespace.
upper Uppercase letters.
xdigit Hexadecimal digits.
Equivalence classes. The syntax `[=_c=]' expands to all of
the characters that are equivalent to _c, in no particular
order. Equivalence classes are a recent invention
intended to support non-English alphabets. But there
seems to be no standard way to define them or determine
their contents. Therefore, they are not fully implemented
in GNU ttrr; each character's equivalence class consists
only of that character, which makes this a useless con-
struction currently.
TTRRAANNSSLLAATTIINNGG
ttrr performs translation when _s_t_r_i_n_g_1 and _s_t_r_i_n_g_2 are both
given and the --delete (_-_d) option is not given. ttrr
translates each character of its input that is in set1 to
the corresponding character in set2. Characters not in
set1 are passed through unchanged. When a character
appears more than once in set1 and the corresponding char-
acters in set2 are not all the same, only the final one is
used. For example, these two commands are equivalent:
tr aaa xyz
tr a z
A common use of ttrr is to convert lowercase characters to
uppercase. This can be done in many ways. Here are three
of them:
tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
tr a-z A-Z
tr '[:lower:]' '[:upper:]'
When ttrr is performing translation, set1 and set2 should
normally have the same length. If set1 is shorter than
set2, the extra characters at the end of set2 are ignored.
On the other hand, making set1 longer than set2 is not
portable; POSIX.2 says that the result is undefined. In
this situation, the BSD ttrr pads set2 to the length of set1
by repeating the last character of set2 as many times as
necessary. The System V ttrr truncates set1 to the length
of set2.
By default, GNU ttrr handles this case like the BSD ttrr does.
When the --truncate-set1 (_-_t) option is given, GNU ttrr han-
dles this case like the System V ttrr instead. This option
is ignored for operations other than translation.
Acting like the System V ttrr in this case breaks the
FSF GNU Text Utilities 3
TR(1L) TR(1L)
relatively common BSD idiom:
tr -cs A-Za-z0-9 '\012'
because it converts only zero bytes (the first element in
the complement of set1), rather than all non-
alphanumerics, to newlines.
SSQQUUEEEEZZIINNGG RREEPPEEAATTSS AANNDD DDEELLEETTIINNGG
When given just the --delete (_-_d) option, ttrr removes any
input characters that are in set1.
When given just the --squeeze-repeats (_-_s) option, ttrr
replaces each input sequence of a repeated character that
is in set1 with a single occurrence of that character.
When given both the --delete and the --squeeze-repeats
options, ttrr first performs any deletions using set1, then
squeezes repeats from any remaining characters using set2.
The --squeeze-repeats option may also be used when trans-
lating, in which case ttrr first performs translation, then
squeezes repeats from any remaining characters using set2.
Here are some examples to illustrate various combinations
of options:
Remove all zero bytes:
tr -d '\000'
Put all words on lines by themselves. This converts all
non-alphanumeric characters to newlines, then squeezes
each string of repeated newlines into a single newline:
tr -cs '[a-zA-Z0-9]' '[\n*]'
Convert each sequence of repeated newlines to a single
newline:
tr -s '\n'
GNU ttrr also accepts the following options in any combina-
tion with the others.
_-_-_h_e_l_p Print a usage message and exit with a non-zero sta-
tus.
_-_-_v_e_r_s_i_o_n
Print version information on standard error then
exit.
WWAARRNNIINNGG MMEESSSSAAGGEESS
Setting the environment variable POSIXLY_CORRECT turns off
several warning and error messages, for strict compliance
with POSIX.2. The messages normally occur in the follow-
ing circumstances:
1. When the _-_-_d_e_l_e_t_e option is given but
FSF GNU Text Utilities 4
TR(1L) TR(1L)
_-_-_s_q_u_e_e_z_e_-_r_e_p_e_a_t_s is not, and _s_t_r_i_n_g_2 is given, GNU ttrr by
default prints a usage message and exits, because _s_t_r_i_n_g_2
would not be used. The POSIX specification says that
_s_t_r_i_n_g_2 must be ignored in this case. Silently ignoring
arguments is a bad idea.
2. When an ambiguous octal escape is given. For example,
\400 is actually \40 followed by the digit 0, because the
value 400 octal does not fit into a single byte.
Note that GNU ttrr does not provide complete BSD or System V
compatibility. For example, there is no option to disable
interpretation of the POSIX constructs [:alpha:], [=c=],
and [c*10]. Also, GNU ttrr does not delete zero bytes auto-
matically, unlike traditional UNIX versions, which provide
no way to preserve zero bytes.
FSF GNU Text Utilities 5