This is Info file textutils.info, produced by Makeinfo-1.64 from the
input file /ade-src/fsf/textutils/doc/textutils.texi.
START-INFO-DIR-ENTRY
* Text utilities: (textutils).          GNU text utilities.
* cat: (textutils)cat invocation.               Concatenate and write files.
* cksum: (textutils)cksum invocation.           Print POSIX CRC checksum.
* comm: (textutils)comm invocation.             Compare sorted files by line.
* csplit: (textutils)csplit invocation.         Split by context.
* cut: (textutils)cut invocation.               Print selected parts of lines.
* expand: (textutils)expand invocation.         Convert tabs to spaces.
* fmt: (textutils)fmt invocation.               Reformat paragraph text.
* fold: (textutils)fold invocation.             Wrap long input lines.
* head: (textutils)head invocation.             Output the first part of files.
* join: (textutils)join invocation.             Join lines on a common field.
* md5sum: (textutils)md5sum invocation.         Print or check message-digests.
* nl: (textutils)nl invocation.                 Number lines and write files.
* od: (textutils)od invocation.                 Dump files in octal, etc.
* paste: (textutils)paste invocation.           Merge lines of files.
* pr: (textutils)pr invocation.                 Paginate or columnate files.
* sort: (textutils)sort invocation.             Sort text files.
* split: (textutils)split invocation.           Split into fixed-size pieces.
* sum: (textutils)sum invocation.               Print traditional checksum.
* tac: (textutils)tac invocation.               Reverse files.
* tail: (textutils)tail invocation.             Output the last part of files.
* tr: (textutils)tr invocation.                 Translate characters.
* unexpand: (textutils)unexpand invocation.     Convert spaces to tabs.
* uniq: (textutils)uniq invocation.             Uniqify files.
* wc: (textutils)wc invocation.                 Byte, word, and line counts.
END-INFO-DIR-ENTRY
   This file documents the GNU text utilities.
   Copyright (C) 1994, 95, 96 Free Software Foundation, Inc.
   Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.
   Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that
the entire resulting derived work is distributed under the terms of a
permission notice identical to this one.
   Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a
translation approved by the Foundation.
File: textutils.info,  Node: sort invocation,  Next: uniq invocation,  Up: Operating on sorted files
`sort': Sort text files
=======================
   `sort' sorts, merges, or compares all the lines from the given
files, or standard input if none are given or for a FILE of `-'.  By
default, `sort' writes the results to standard output.  Synopsis:
     sort [OPTION]... [FILE]...
   `sort' has three modes of operation: sort (the default), merge, and
check for sortedness.  The following options change the operation mode:
     Check whether the given files are already sorted: if they are not
     all sorted, print an error message and exit with a status of 1.
     Otherwise, exit successfully.
     Merge the given files by sorting them as a group.  Each input file
     must always be individually sorted.  It always works to sort
     instead of merge; merging is provided because it is faster, in the
     case where it works.
   A pair of lines is compared as follows: if any key fields have been
specified, `sort' compares each pair of fields, in the order specified
on the command line, according to the associated ordering options,
until a difference is found or no fields are left.
   If any of the global options `Mbdfinr' are given but no key fields
are specified, `sort' compares the entire lines according to the global
options.
   Finally, as a last resort when all keys compare equal (or if no
ordering options were specified at all), `sort' compares the lines byte
by byte in machine collating sequence.  The last resort comparison
honors the `-r' global option.  The `-s' (stable) option disables this
last-resort comparison so that lines in which all fields compare equal
are left in their original relative order.  If no fields or global
options are specified, `-s' has no effect.
   GNU `sort' (as specified for all GNU utilities) has no limits on
input line length or restrictions on bytes allowed within lines.  In
addition, if the final byte of an input file is not a newline, GNU
`sort' silently supplies one.
   Upon any error, `sort' exits with a status of `2'.
   If the environment variable `TMPDIR' is set, `sort' uses its value
as the directory for temporary files instead of `/tmp'.  The `-T
TEMPDIR' option in turn overrides the environment variable.
   The following options affect the ordering of output lines.  They may
be specified globally or as part of a specific key field.  If no key
fields are specified, global options apply to comparison of entire
lines; otherwise the global options are inherited by key fields that do
not specify any special options of their own.
     Ignore leading blanks when finding sort keys in each line.
     Sort in "phone directory" order: ignore all characters except
     letters, digits and blanks when sorting.
     Fold lowercase characters into the equivalent uppercase characters
     when sorting so that, for example, `b' and `B' sort as equal.
     Sort numerically, but use strtod(3) to arrive at the numeric
     values.  This allows floating point numbers to be specified in
     scientific notation, like `1.0e-34' and `10e100'.  Use this option
     only if there is no alternative;  it is much slower than `-n' and
     numbers with too many significant digits will be compared as if
     they had been truncated.  In addition, numbers outside the range
     of representable double precision floating point numbers are
     treated as if they were zeroes; overflow and underflow are not
     reported.
     Ignore characters outside the printable ASCII range 040-0176 octal
     (inclusive) when sorting.
     An initial string, consisting of any amount of whitespace, followed
     by three letters abbreviating a month name, is folded to UPPER
     case and compared in the order `JAN' < `FEB' < ... < `DEC'.
     Invalid names compare low to valid names.
     Sort numerically: the number begins each line; specifically, it
     consists of optional whitespace, an optional `-' sign, and zero or
     more digits, optionally followed by a decimal point and zero or
     more digits.
     `sort -n' uses what might be considered an unconventional method
     to compare strings representing floating point numbers.  Rather
     than first converting each string to the C `double' type and then
     comparing those values, sort aligns the decimal points in the two
     strings and compares the strings a character at a time.  One
     benefit of using this approach is its speed.  In practice this is
     much more efficient than performing the two corresponding
     string-to-double (or even string-to-integer) conversions and then
     comparing doubles.  In addition, there is no corresponding loss of
     precision.  Converting each string to `double' before comparison
     would limit precision to about 16 digits on most systems.
     Neither a leading `+' nor exponential notation is recognized.  To
     compare such strings numerically, use the `-g' option.
     Reverse the result of comparison, so that lines with greater key
     values appear earlier in the output instead of later.
   Other options are:
`-o OUTPUT-FILE'
     Write output to OUTPUT-FILE instead of standard output.  If
     OUTPUT-FILE is one of the input files, `sort' copies it to a
     temporary file before sorting and writing the output to
     OUTPUT-FILE.
`-t SEPARATOR'
     Use character SEPARATOR as the field separator when finding the
     sort keys in each line.  By default, fields are separated by the
     empty string between a non-whitespace character and a whitespace
     character.  That is, given the input line ` foo bar', `sort'
     breaks it into fields ` foo' and ` bar'.  The field separator is
     not considered to be part of either the field preceding or the
     field following.
     For the default case or the `-m' option, only output the first of
     a sequence of lines that compare equal.  For the `-c' option,
     check that no pair of consecutive lines compares equal.
`-k POS1[,POS2]'
     The recommended, POSIX, option for specifying a sort field.  The
     field consists of the line between POS1 and POS2 (or the end of
     the line, if POS2 is omitted), inclusive.  Fields and character
     positions are numbered starting with 1.  See below.
     Treat the input as a set of lines, each terminated by a zero byte
     (ASCII NUL (Null) character) instead of a ASCII LF (Line Feed.)
     This option can be useful in conjunction with `perl -0' or `find
     -print0' and `xargs -0' which do the same in order to reliably
     handle arbitrary pathnames (even those which contain Line Feed
     characters.)
`+POS1[-POS2]'
     The obsolete, traditional option for specifying a sort field.  The
     field consists of the line between POS1 and up to but *not
     including* POS2 (or the end of the line if POS2 is omitted).
     Fields and character positions are numbered starting with 0.  See
     below.
   In addition, when GNU `sort' is invoked with exactly one argument,
options `--help' and `--version' are recognized.  *Note Common
options::.
   Historical (BSD and System V) implementations of `sort' have
differed in their interpretation of some options, particularly `-b',
`-f', and `-n'.  GNU sort follows the POSIX behavior, which is usually
(but not always!) like the System V behavior.  According to POSIX, `-n'
no longer implies `-b'.  For consistency, `-M' has been changed in the
same way.  This may affect the meaning of character positions in field
specifications in obscure cases.  The only fix is to add an explicit
`-b'.
   A position in a sort field specified with the `-k' or `+' option has
the form `F.C', where F is the number of the field to use and C is the
number of the first character from the beginning of the field (for
`+POS') or from the end of the previous field (for `-POS').  If the `.C'
is omitted, it is taken to be the first character in the field.  If the
`-b' option was specified, the `.C' part of a field specification is
counted from the first nonblank character of the field (for `+POS') or
from the first nonblank character following the previous field (for
`-POS').
   A sort key option may also have any of the option letters `Mbdfinr'
appended to it, in which case the global ordering options are not used
for that particular field.  The `-b' option may be independently
attached to either or both of the `+POS' and `-POS' parts of a field
specification, and if it is inherited from the global options it will
be attached to both.  If a `-n' or `-M' option is used, thus implying a
`-b' option, the `-b' option is taken to apply to both the `+POS' and
the `-POS' parts of a key specification.  Keys may span multiple fields.
   Here are some examples to illustrate various combinations of options.
In them, the POSIX `-k' option is used to specify sort keys rather than
the obsolete `+POS1-POS2' syntax.
   * Sort in descending (reverse) numeric order.
          sort -nr
     Sort alphabetically, omitting the first and second fields.  This
     uses a single key composed of the characters beginning at the
     start of field three and extending to the end of each line.
          sort -k3
   * Sort numerically on the second field and resolve ties by sorting
     alphabetically on the third and fourth characters of field five.
     Use `:' as the field delimiter.
          sort -t : -k 2,2n -k 5.3,5.4
     Note that if you had written `-k 2' instead of `-k 2,2' `sort'
     would have used all characters beginning in the second field and
     extending to the end of the line as the primary *numeric* key.
     For the large majority of applications, treating keys spanning
     more than one field as numeric will not do what you expect.
     Also note that the `n' modifier was applied to the field-end
     specifier for the first key.  It would have been equivalent to
     specify `-k 2n,2' or `-k 2n,2n'.  All modifiers except `b' apply
     to the associated *field*, regardless of whether the modifier
     character is attached to the field-start and/or the field-end part
     of the key specifier.
   * Sort the password file on the fifth field and ignore any leading
     white space.  Sort lines with equal values in field five on the
     numeric user ID in field three.
          sort -t : -k 5b,5 -k 3,3n /etc/passwd
     An alternative is to use the global numeric modifier `-n'.
          sort -t : -n -k 5b,5 -k 3,3 /etc/passwd
   * Generate a tags file in case insensitive sorted order.
          find src -type f -print0 | sort -t / -z -f | xargs -0 etags --append
     The use of `-print0', `-z', and `-0' in this case mean that
     pathnames that contain Line Feed characters will not get broken up
     by the sort operation.
     Finally, to ignore both leading and trailing white space, you
     could have applied the `b' modifier to the field-end specifier for
     the first key,
          sort -t : -n -k 5b,5b -k 3,3 /etc/passwd
     or by using the global `-b' modifier instead of `-n' and an
     explicit `n' with the second key specifier.
          sort -t : -b -k 5,5 -k 3,3n /etc/passwd
File: textutils.info,  Node: uniq invocation,  Next: comm invocation,  Prev: sort invocation,  Up: Operating on sorted files
`uniq': Uniqify files
=====================
   `uniq' writes the unique lines in the given `input', or standard
input if nothing is given or for an INPUT name of `-'.  Synopsis:
     uniq [OPTION]... [INPUT [OUTPUT]]
   By default, `uniq' prints the unique lines in a sorted file, i.e.,
discards all but one of identical successive lines.  Optionally, it can
instead show only lines that appear exactly once, or lines that appear
more than once.
   The input must be sorted.  If your input is not sorted, perhaps you
want to use `sort -u'.
   If no OUTPUT file is specified, `uniq' writes to standard output.
   The program accepts the following options.  Also see *Note Common
options::.
`-f N'
`--skip-fields=N'
     Skip N fields on each line before checking for uniqueness.  Fields
     are sequences of non-space non-tab characters that are separated
     from each other by at least one spaces or tabs.
`-s N'
`--skip-chars=N'
     Skip N characters before checking for uniqueness.  If you use both
     the field and character skipping options, fields are skipped over
     first.
`--count'
     Print the number of times each line occurred along with the line.
`--ignore-case'
     Ignore differences in case when comparing lines.
`--repeated'
     Print only duplicate lines.
`--unique'
     Print only unique lines.
`-w N'
`--check-chars=N'
     Compare N characters on each line (after skipping any specified
     fields and characters).  By default the entire rest of the lines
     are compared.
File: textutils.info,  Node: comm invocation,  Prev: uniq invocation,  Up: Operating on sorted files
`comm': Compare two sorted files line by line
=============================================
   `comm' writes to standard output lines that are common, and lines
that are unique, to two input files; a file name of `-' means standard
input.  Synopsis:
     comm [OPTION]... FILE1 FILE2
   The input files must be sorted before `comm' can be used.
   With no options, `comm' produces three column output.  Column one
contains lines unique to FILE1, column two contains lines unique to
FILE2, and column three contains lines common to both files.  Columns
are separated by TAB.
   The options `-1', `-2', and `-3' suppress printing of the
corresponding columns.  Also see *Note Common options::.
File: textutils.info,  Node: Operating on fields within a line,  Next: Operating on characters,  Prev: Operating on sorted files,  Up: Top
Operating on fields within a line
*********************************
* Menu:
* cut invocation::              Print selected parts of lines.
* paste invocation::            Merge lines of files.
* join invocation::             Join lines on a common field.
File: textutils.info,  Node: cut invocation,  Next: paste invocation,  Up: Operating on fields within a line
`cut': Print selected parts of lines
====================================
   `cut' writes to standard output selected parts of each line of each
input file, or standard input if no files are given or for a file name
of `-'.  Synopsis:
     cut [OPTION]... [FILE]...
   In the table which follows, the BYTE-LIST, CHARACTER-LIST, and
FIELD-LIST are one or more numbers or ranges (two numbers separated by
a dash) separated by commas.  Bytes, characters, and fields are
numbered from starting at 1.  Incomplete ranges may be given: `-M'
means `1-M'; `N-' means `N' through end of line or last field.
   The program accepts the following options.  Also see *Note Common
options::.
`-b BYTE-LIST'
`--bytes=BYTE-LIST'
     Print only the bytes in positions listed in BYTE-LIST.  Tabs and
     backspaces are treated like any other character; they take up 1
     byte.
`-c CHARACTER-LIST'
`--characters=CHARACTER-LIST'
     Print only characters in positions listed in CHARACTER-LIST.  The
     same as `-b' for now, but internationalization will change that.
     Tabs and backspaces are treated like any other character; they
     take up 1 character.
`-f FIELD-LIST'
`--fields=FIELD-LIST'
     Print only the fields listed in FIELD-LIST.  Fields are separated
     by a TAB by default.
`-d DELIM'
`--delimiter=DELIM'
     For `-f', fields are separated by the first character in DELIM
     (default is TAB).
     Do not split multi-byte characters (no-op for now).
`--only-delimited'
     For `-f', do not print lines that do not contain the field
     separator character.
File: textutils.info,  Node: paste invocation,  Next: join invocation,  Prev: cut invocation,  Up: Operating on fields within a line
`paste': Merge lines of files
=============================
   `paste' writes to standard output lines consisting of sequentially
corresponding lines of each given file, separated by TAB.  Standard
input is used for a file name of `-' or if no input files are given.
   Synopsis:
     paste [OPTION]... [FILE]...
   The program accepts the following options.  Also see *Note Common
options::.
`--serial'
     Paste the lines of one file at a time rather than one line from
     each file.
`-d DELIM-LIST'
`--delimiters DELIM-LIST'
     Consecutively use the characters in DELIM-LIST instead of TAB to
     separate merged lines.  When DELIM-LIST is exhausted, start again
     at its beginning.
File: textutils.info,  Node: join invocation,  Prev: paste invocation,  Up: Operating on fields within a line
`join': Join lines on a common field
====================================
   `join' writes to standard output a line for each pair of input lines
that have identical join fields.  Synopsis:
     join [OPTION]... FILE1 FILE2
   Either FILE1 or FILE2 (but not both) can be `-', meaning standard
input.  FILE1 and FILE2 should be already sorted in increasing order
(not numerically) on the join fields; unless the `-t' option is given,
they should be sorted ignoring blanks at the start of the join field,
as in `sort -b'.  If the `--ignore-case' option is given, lines should
be sorted without regard to the case of characters in the join field,
as in `sort -f'.
   The defaults are: the join field is the first field in each line;
fields in the input are separated by one or more blanks, with leading
blanks on the line ignored; fields in the output are separated by a
space; each output line consists of the join field, the remaining
fields from FILE1, then the remaining fields from FILE2.
   The program accepts the following options.  Also see *Note Common
options::.
`-a FILE-NUMBER'
     Print a line for each unpairable line in file FILE-NUMBER (either
     `1' or `2'), in addition to the normal output.
`-e STRING'
     Replace those output fields that are missing in the input with
     STRING.
`--ignore-case'
     Ignore differences in case when comparing keys.  With this option,
     the lines of the input files must be ordered in the same way.  Use
     `sort -f' to produce this ordering.
`-1 FIELD'
`-j1 FIELD'
     Join on field FIELD (a positive integer) of file 1.
`-2 FIELD'
`-j2 FIELD'
     Join on field FIELD (a positive integer) of file 2.
`-j FIELD'
     Equivalent to `-1 FIELD -2 FIELD'.
`-o FIELD-LIST...'
     Construct each output line according to the format in FIELD-LIST.
     Each element in FIELD-LIST is either the single character `0' or
     has the form M.N where the file number, M, is `1' or `2' and N is
     a positive field number.
     A field specification of `0' denotes the join field.  In most
     cases, the functionality of the `0' field spec may be reproduced
     using the explicit M.N that corresponds to the join field.
     However, when printing unpairable lines (using either of the `-a'
     or `-v' options), there is no way to specify the join field using
     M.N in FIELD-LIST if there are unpairable lines in both files.  To
     give `join' that functionality, POSIX invented the `0' field
     specification notation.
     The elements in FIELD-LIST are separated by commas or blanks.
     Multiple FIELD-LIST arguments can be given after a single `-o'
     option; the values of all lists given with `-o' are concatenated
     together.  All output lines - including those printed because of
     any -a or -v option - are subject to the specified FIELD-LIST.
`-t CHAR'
     Use character CHAR as the input and output field separator.
`-v FILE-NUMBER'
     Print a line for each unpairable line in file FILE-NUMBER (either
     `1' or `2'), instead of the normal output.
   In addition, when GNU `join' is invoked with exactly one argument,
options `--help' and `--version' are recognized.  *Note Common
options::.
File: textutils.info,  Node: Operating on characters,  Next: Opening the software toolbox,  Prev: Operating on fields within a line,  Up: Top
Operating on characters
***********************
   This commands operate on individual characters.
* Menu:
* tr invocation::               Translate, squeeze, and/or delete characters.
* expand invocation::           Convert tabs to spaces.
* unexpand invocation::         Convert spaces to tabs.
File: textutils.info,  Node: tr invocation,  Next: expand invocation,  Up: Operating on characters
`tr': Translate, squeeze, and/or delete characters
==================================================
   Synopsis:
     tr [OPTION]... SET1 [SET2]
   `tr' copies standard input to standard output, performing one of the
following operations:
   * translate, and optionally squeeze repeated characters in the
     result,
   * squeeze repeated characters,
   * delete characters,
   * delete characters, then squeeze repeated characters from the
     result.
   The SET1 and (if given) SET2 arguments define ordered sets of
characters, referred to below as SET1 and SET2.  These sets are the
characters of the input that `tr' operates on.  The `--complement'
(`-c') option replaces SET1 with its complement (all of the characters
that are not in SET1).
* Menu:
* Character sets::              Specifying sets of characters.
* Translating::                 Changing one characters to another.
* Squeezing::                   Squeezing repeats and deleting.
* Warnings in tr::              Warning messages.
File: textutils.info,  Node: Character sets,  Next: Translating,  Up: tr invocation
Specifying sets of characters
-----------------------------
   The format of the SET1 and SET2 arguments resembles the format of
regular expressions; however, they are not regular expressions, only
lists of characters.  Most characters simply represent themselves in
these strings, but the strings can contain the shorthands listed below,
for convenience.  Some of them can be used only in SET1 or SET2, as
noted below.
Backslash escapes.
     A backslash followed by a character not listed below causes an
     error message.
    `\a'
          Control-G,
    `\b'
          Control-H,
    `\f'
          Control-L,
    `\n'
          Control-J,
    `\r'
          Control-M,
    `\t'
          Control-I,
    `\v'
          Control-K,
    `\OOO'
          The character with the value given by OOO, which is 1 to 3
          octal digits,
    `\\'
          A backslash.
Ranges.
     The notation `M-N' expands to all of the characters from M through
     N, in ascending order.  M should collate before N; if it doesn't,
     an error results.  As an example, `0-9' is the same as
     `0123456789'.  Although GNU `tr' does not support the System V
     syntax that uses square brackets to enclose ranges, translations
     specified in that format will still work as long as the brackets
     in STRING1 correspond to identical brackets in STRING2.
Repeated characters.
     The notation `[C*N]' in SET2 expands to N copies of character C.
     Thus, `[y*6]' is the same as `yyyyyy'.  The notation `[C*]' in
     STRING2 expands to as many copies of C as are needed to make SET2
     as long as SET1.  If N begins with `0', it is interpreted in
     octal, otherwise in decimal.
Character classes.
     The notation `[:CLASS:]' expands to all of the characters in the
     (predefined) class CLASS.  The characters expand in no particular
     order, except for the `upper' and `lower' classes, which expand in
     ascending order.  When the `--delete' (`-d') and
     `--squeeze-repeats' (`-s') options are both given, any character
     class can be used in SET2.  Otherwise, only the character classes
     `lower' and `upper' are accepted in SET2, and then only if the
     corresponding character class (`upper' and `lower', respectively)
     is specified in the same relative position in SET1.  Doing this
     specifies case conversion.  The class names are given below; an
     error results when an invalid class name is given.
    `alnum'
          Letters and digits.
    `alpha'
          Letters.
    `blank'
          Horizontal whitespace.
    `cntrl'
          Control characters.
    `digit'
          Digits.
    `graph'
          Printable characters, not including space.
    `lower'
          Lowercase letters.
    `print'
          Printable characters, including space.
    `punct'
          Punctuation characters.
    `space'
          Horizontal or vertical whitespace.
    `upper'
          Uppercase letters.
    `xdigit'
          Hexadecimal digits.
Equivalence classes.
     The syntax `[=C=]' expands to all of the characters that are
     equivalent to C, in no particular order.  Equivalence classes are
     a relatively recent invention intended to support non-English
     alphabets.  But there seems to be no standard way to define them
     or determine their contents.  Therefore, they are not fully
     implemented in GNU `tr'; each character's equivalence class
     consists only of that character, which is of no particular use.
File: textutils.info,  Node: Translating,  Next: Squeezing,  Prev: Character sets,  Up: tr invocation
Translating
-----------
   `tr' performs translation when SET1 and SET2 are both given and the
`--delete' (`-d') option is not given.  `tr' translates each character
of its input that is in SET1 to the corresponding character in SET2.
Characters not in SET1 are passed through unchanged.  When a character
appears more than once in SET1 and the corresponding characters in SET2
are not all the same, only the final one is used.  For example, these
two commands are equivalent:
     tr aaa xyz
     tr a z
   A common use of `tr' is to convert lowercase characters to
uppercase.  This can be done in many ways.  Here are three of them:
     tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
     tr a-z A-Z
     tr '[:lower:]' '[:upper:]'
   When `tr' is performing translation, SET1 and SET2 typically have
the same length.  If SET1 is shorter than SET2, the extra characters at
the end of SET2 are ignored.
   On the other hand, making SET1 longer than SET2 is not portable;
POSIX.2 says that the result is undefined.  In this situation, BSD `tr'
pads SET2 to the length of SET1 by repeating the last character of SET2
as many times as necessary.  System V `tr' truncates SET1 to the length
of SET2.
   By default, GNU `tr' handles this case like BSD `tr'.  When the
`--truncate-set1' (`-t') option is given, GNU `tr' handles this case
like the System V `tr' instead.  This option is ignored for operations
other than translation.
   Acting like System V `tr' in this case breaks the relatively common
BSD idiom:
     tr -cs A-Za-z0-9 '\012'
because it converts only zero bytes (the first element in the
complement of SET1), rather than all non-alphanumerics, to newlines.
File: textutils.info,  Node: Squeezing,  Next: Warnings in tr,  Prev: Translating,  Up: tr invocation
Squeezing repeats and deleting
------------------------------
   When given just the `--delete' (`-d') option, `tr' removes any input
characters that are in SET1.
   When given just the `--squeeze-repeats' (`-s') option, `tr' replaces
each input sequence of a repeated character that is in SET1 with a
single occurrence of that character.
   When given both `--delete' and `--squeeze-repeats', `tr' first
performs any deletions using SET1, then squeezes repeats from any
remaining characters using SET2.
   The `--squeeze-repeats' option may also be used when translating, in
which case `tr' first performs translation, then squeezes repeats from
any remaining characters using SET2.
   Here are some examples to illustrate various combinations of options:
   * Remove all zero bytes:
          tr -d '\000'
   * Put all words on lines by themselves.  This converts all
     non-alphanumeric characters to newlines, then squeezes each string
     of repeated newlines into a single newline:
          tr -cs '[a-zA-Z0-9]' '[\n*]'
   * Convert each sequence of repeated newlines to a single newline:
          tr -s '\n'
File: textutils.info,  Node: Warnings in tr,  Prev: Squeezing,  Up: tr invocation
Warning messages
----------------
   Setting the environment variable `POSIXLY_CORRECT' turns off the
following warning and error messages, for strict compliance with
POSIX.2.  Otherwise, the following diagnostics are issued:
  1. When the `--delete' option is given but `--squeeze-repeats' is
     not, and SET2 is given, GNU `tr' by default prints a usage message
     and exits, because SET2 would not be used.  The POSIX
     specification says that SET2 must be ignored in this case.
     Silently ignoring arguments is a bad idea.
  2. When an ambiguous octal escape is given.  For example, `\400' is
     actually `\40' followed by the digit `0', because the value 400
     octal does not fit into a single byte.
   GNU `tr' does not provide complete BSD or System V compatibility.
For example, it is impossible to disable interpretation of the POSIX
constructs `[:alpha:]', `[=c=]', and `[c*10]'.  Also, GNU `tr' does not
delete zero bytes automatically, unlike traditional Unix versions,
which provide no way to preserve zero bytes.
File: textutils.info,  Node: expand invocation,  Next: unexpand invocation,  Prev: tr invocation,  Up: Operating on characters
`expand': Convert tabs to spaces
================================
   `expand' writes the contents of each given FILE, or standard input
if none are given or for a FILE of `-', to standard output, with tab
characters converted to the appropriate number of spaces.  Synopsis:
     expand [OPTION]... [FILE]...
   By default, `expand' converts all tabs to spaces.  It preserves
backspace characters in the output; they decrement the column count for
tab calculations.  The default action is equivalent to `-8' (set tabs
every 8 columns).
   The program accepts the following options.  Also see *Note Common
options::.
`-TAB1[,TAB2]...'
`-t TAB1[,TAB2]...'
`--tabs=TAB1[,TAB2]...'
     If only one tab stop is given, set the tabs TAB1 spaces apart
     (default is 8).  Otherwise, set the tabs at columns TAB1, TAB2,
     ... (numbered from 0), and replace any tabs beyond the last
     tabstop given with single spaces.  If the tabstops are specified
     with the `-t' or `--tabs' option, they can be separated by blanks
     as well as by commas.
`--initial'
     Only convert initial tabs (those that precede all non-space or
     non-tab characters) on each line to spaces.
File: textutils.info,  Node: unexpand invocation,  Prev: expand invocation,  Up: Operating on characters
`unexpand': Convert spaces to tabs
==================================
   `unexpand' writes the contents of each given FILE, or standard input
if none are given or for a FILE of `-', to standard output, with
strings of two or more space or tab characters converted to as many
tabs as possible followed by as many spaces as are needed.  Synopsis:
     unexpand [OPTION]... [FILE]...
   By default, `unexpand' converts only initial spaces and tabs (those
that precede all non space or tab characters) on each line.  It
preserves backspace characters in the output; they decrement the column
count for tab calculations.  By default, tabs are set at every 8th
column.
   The program accepts the following options.  Also see *Note Common
options::.
`-TAB1[,TAB2]...'
`-t TAB1[,TAB2]...'
`--tabs=TAB1[,TAB2]...'
     If only one tab stop is given, set the tabs TAB1 spaces apart
     instead of the default 8.  Otherwise, set the tabs at columns
     TAB1, TAB2, ... (numbered from 0), and leave spaces and tabs
     beyond the tabstops given unchanged.  If the tabstops are specified
     with the `-t' or `--tabs' option, they can be separated by blanks
     as well as by commas.  This option implies the `-a' option.
`--all'
     Convert all strings of two or more spaces or tabs, not just initial
     ones, to tabs.
File: textutils.info,  Node: Opening the software toolbox,  Next: Index,  Prev: Operating on characters,  Up: Top
Opening the software toolbox
****************************
   This chapter originally appeared in `Linux Journal', volume 1,
number 2, in the `What's GNU?' column. It was written by Arnold Robbins.
* Menu:
* Toolbox introduction::
* I/O redirection::
* The `who' command::
* The `cut' command::
* The `sort' command::
* The `uniq' command::
* Putting the tools together::
File: textutils.info,  Node: Toolbox introduction,  Next: I/O redirection,  Up: Opening the software toolbox
Toolbox introduction
====================
   This month's column is only peripherally related to the GNU Project,
in that it describes a number of the GNU tools on your Linux system and
how they might be used.  What it's really about is the "Software Tools"
philosophy of program development and usage.
   The software tools philosophy was an important and integral concept
in the initial design and development of Unix (of which Linux and GNU
are essentially clones).  Unfortunately, in the modern day press of
Internetworking and flashy GUIs, it seems to have fallen by the
wayside.  This is a shame, since it provides a powerful mental model
for solving many kinds of problems.
   Many people carry a Swiss Army knife around in their pants pockets
(or purse).  A Swiss Army knife is a handy tool to have: it has several
knife blades, a screwdriver, tweezers, toothpick, nail file, corkscrew,
and perhaps a number of other things on it.  For the everyday, small
miscellaneous jobs where you need a simple, general purpose tool, it's
just the thing.
   On the other hand, an experienced carpenter doesn't build a house
using a Swiss Army knife.  Instead, he has a toolbox chock full of
specialized tools--a saw, a hammer, a screwdriver, a plane, and so on.
And he knows exactly when and where to use each tool; you won't catch
him hammering nails with the handle of his screwdriver.
   The Unix developers at Bell Labs were all professional programmers
and trained computer scientists.  They had found that while a
one-size-fits-all program might appeal to a user because there's only
one program to use, in practice such programs are
  a. difficult to write,
  b. difficult to maintain and debug, and
  c. difficult to extend to meet new situations.
   Instead, they felt that programs should be specialized tools.  In
short, each program "should do one thing well."  No more and no less.
Such programs are simpler to design, write, and get right--they only do
one thing.
   Furthermore, they found that with the right machinery for hooking
programs together, that the whole was greater than the sum of the
parts.  By combining several special purpose programs, you could
accomplish a specific task that none of the programs was designed for,
and accomplish it much more quickly and easily than if you had to write
a special purpose program.  We will see some (classic) examples of this
further on in the column.  (An important additional point was that, if
necessary, take a detour and build any software tools you may need
first, if you don't already have something appropriate in the toolbox.)
File: textutils.info,  Node: I/O redirection,  Next: The `who' command,  Prev: Toolbox introduction,  Up: Opening the software toolbox
I/O redirection
===============
   Hopefully, you are familiar with the basics of I/O redirection in the
shell, in particular the concepts of "standard input," "standard
output," and "standard error".  Briefly, "standard input" is a data
source, where data comes from.  A program should not need to either
know or care if the data source is a disk file, a keyboard, a magnetic
tape, or even a punched card reader.  Similarly, "standard output" is a
data sink, where data goes to.  The program should neither know nor
care where this might be.  Programs that only read their standard
input, do something to the data, and then send it on, are called
"filters", by analogy to filters in a water pipeline.
   With the Unix shell, it's very easy to set up data pipelines:
     program_to_create_data | filter1 | .... | filterN > final.pretty.data
   We start out by creating the raw data; each filter applies some
successive transformation to the data, until by the time it comes out
of the pipeline, it is in the desired form.
   This is fine and good for standard input and standard output.  Where
does the standard error come in to play?  Well, think about `filter1' in
the pipeline above.  What happens if it encounters an error in the data
it sees?  If it writes an error message to standard output, it will just
disappear down the pipeline into `filter2''s input, and the user will
probably never see it.  So programs need a place where they can send
error messages so that the user will notice them.  This is standard
error, and it is usually connected to your console or window, even if
you have redirected standard output of your program away from your
screen.
   For filter programs to work together, the format of the data has to
be agreed upon.  The most straightforward and easiest format to use is
simply lines of text.  Unix data files are generally just streams of
bytes, with lines delimited by the ASCII LF (Line Feed) character,
conventionally called a "newline" in the Unix literature. (This is
`'\n'' if you're a C programmer.)  This is the format used by all the
traditional filtering programs.  (Many earlier operating systems had
elaborate facilities and special purpose programs for managing binary
data.  Unix has always shied away from such things, under the
philosophy that it's easiest to simply be able to view and edit your
data with a text editor.)
   OK, enough introduction. Let's take a look at some of the tools, and
then we'll see how to hook them together in interesting ways.   In the
following discussion, we will only present those command line options
that interest us.  As you should always do, double check your system
documentation for the full story.
File: textutils.info,  Node: The `who' command,  Next: The `cut' command,  Prev: I/O redirection,  Up: Opening the software toolbox
The `who' command
=================
   The first program is the `who' command.  By itself, it generates a
list of the users who are currently logged in.  Although I'm writing
this on a single-user system, we'll pretend that several people are
logged in:
     $ who
     arnold   console Jan 22 19:57
     miriam   ttyp0   Jan 23 14:19(:0.0)
     bill     ttyp1   Jan 21 09:32(:0.0)
     arnold   ttyp2   Jan 23 20:48(:0.0)
   Here, the `$' is the usual shell prompt, at which I typed `who'.
There are three people logged in, and I am logged in twice.  On
traditional Unix systems, user names are never more than eight
characters long.  This little bit of trivia will be useful later.  The
output of `who' is nice, but the data is not all that exciting.
File: textutils.info,  Node: The `cut' command,  Next: The `sort' command,  Prev: The `who' command,  Up: Opening the software toolbox
The `cut' command
=================
   The next program we'll look at is the `cut' command.  This program
cuts out columns or fields of input data.  For example, we can tell it
to print just the login name and full name from the `/etc/passwd file'.
The `/etc/passwd' file has seven fields, separated by colons:
     arnold:xyzzy:2076:10:Arnold D. Robbins:/home/arnold:/bin/ksh
   To get the first and fifth fields, we would use cut like this:
     $ cut -d: -f1,5 /etc/passwd
     root:Operator
     ...
     arnold:Arnold D. Robbins
     miriam:Miriam A. Robbins
     ...
   With the `-c' option, `cut' will cut out specific characters (i.e.,
columns) in the input lines.  This command looks like it might be
useful for data filtering.
File: textutils.info,  Node: The `sort' command,  Next: The `uniq' command,  Prev: The `cut' command,  Up: Opening the software toolbox
The `sort' command
==================
   Next we'll look at the `sort' command.  This is one of the most
powerful commands on a Unix-style system; one that you will often find
yourself using when setting up fancy data plumbing. The `sort' command
reads and sorts each file named on the command line.  It then merges
the sorted data and writes it to standard output.  It will read
standard input if no files are given on the command line (thus making
it into a filter).  The sort is based on the machine collating sequence
(ASCII) or based on  user-supplied ordering criteria.
File: textutils.info,  Node: The `uniq' command,  Next: Putting the tools together,  Prev: The `sort' command,  Up: Opening the software toolbox
The `uniq' command
==================
   Finally (at least for now), we'll look at the `uniq' program.  When
sorting data, you will often end up with duplicate lines, lines that
are identical.  Usually, all you need is one instance of each line.
This is where `uniq' comes in. The `uniq' program reads its standard
input, which it expects to be sorted.  It only prints out one copy of
each duplicated line.  It does have several options.  Later on, we'll
use the `-c' option, which prints each unique line, preceded by a count
of the number of times that line occurred in the input.