This is Info file textutils.info, produced by Makeinfo-1.64 from the input file /ade-src/fsf/textutils/doc/textutils.texi. START-INFO-DIR-ENTRY * Text utilities: (textutils). GNU text utilities. * cat: (textutils)cat invocation. Concatenate and write files. * cksum: (textutils)cksum invocation. Print POSIX CRC checksum. * comm: (textutils)comm invocation. Compare sorted files by line. * csplit: (textutils)csplit invocation. Split by context. * cut: (textutils)cut invocation. Print selected parts of lines. * expand: (textutils)expand invocation. Convert tabs to spaces. * fmt: (textutils)fmt invocation. Reformat paragraph text. * fold: (textutils)fold invocation. Wrap long input lines. * head: (textutils)head invocation. Output the first part of files. * join: (textutils)join invocation. Join lines on a common field. * md5sum: (textutils)md5sum invocation. Print or check message-digests. * nl: (textutils)nl invocation. Number lines and write files. * od: (textutils)od invocation. Dump files in octal, etc. * paste: (textutils)paste invocation. Merge lines of files. * pr: (textutils)pr invocation. Paginate or columnate files. * sort: (textutils)sort invocation. Sort text files. * split: (textutils)split invocation. Split into fixed-size pieces. * sum: (textutils)sum invocation. Print traditional checksum. * tac: (textutils)tac invocation. Reverse files. * tail: (textutils)tail invocation. Output the last part of files. * tr: (textutils)tr invocation. Translate characters. * unexpand: (textutils)unexpand invocation. Convert spaces to tabs. * uniq: (textutils)uniq invocation. Uniqify files. * wc: (textutils)wc invocation. Byte, word, and line counts. END-INFO-DIR-ENTRY This file documents the GNU text utilities. Copyright (C) 1994, 95, 96 Free Software Foundation, Inc. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Foundation. File: textutils.info, Node: sort invocation, Next: uniq invocation, Up: Operating on sorted files `sort': Sort text files ======================= `sort' sorts, merges, or compares all the lines from the given files, or standard input if none are given or for a FILE of `-'. By default, `sort' writes the results to standard output. Synopsis: sort [OPTION]... [FILE]... `sort' has three modes of operation: sort (the default), merge, and check for sortedness. The following options change the operation mode: Check whether the given files are already sorted: if they are not all sorted, print an error message and exit with a status of 1. Otherwise, exit successfully. Merge the given files by sorting them as a group. Each input file must always be individually sorted. It always works to sort instead of merge; merging is provided because it is faster, in the case where it works. A pair of lines is compared as follows: if any key fields have been specified, `sort' compares each pair of fields, in the order specified on the command line, according to the associated ordering options, until a difference is found or no fields are left. If any of the global options `Mbdfinr' are given but no key fields are specified, `sort' compares the entire lines according to the global options. Finally, as a last resort when all keys compare equal (or if no ordering options were specified at all), `sort' compares the lines byte by byte in machine collating sequence. The last resort comparison honors the `-r' global option. The `-s' (stable) option disables this last-resort comparison so that lines in which all fields compare equal are left in their original relative order. If no fields or global options are specified, `-s' has no effect. GNU `sort' (as specified for all GNU utilities) has no limits on input line length or restrictions on bytes allowed within lines. In addition, if the final byte of an input file is not a newline, GNU `sort' silently supplies one. Upon any error, `sort' exits with a status of `2'. If the environment variable `TMPDIR' is set, `sort' uses its value as the directory for temporary files instead of `/tmp'. The `-T TEMPDIR' option in turn overrides the environment variable. The following options affect the ordering of output lines. They may be specified globally or as part of a specific key field. If no key fields are specified, global options apply to comparison of entire lines; otherwise the global options are inherited by key fields that do not specify any special options of their own. Ignore leading blanks when finding sort keys in each line. Sort in "phone directory" order: ignore all characters except letters, digits and blanks when sorting. Fold lowercase characters into the equivalent uppercase characters when sorting so that, for example, `b' and `B' sort as equal. Sort numerically, but use strtod(3) to arrive at the numeric values. This allows floating point numbers to be specified in scientific notation, like `1.0e-34' and `10e100'. Use this option only if there is no alternative; it is much slower than `-n' and numbers with too many significant digits will be compared as if they had been truncated. In addition, numbers outside the range of representable double precision floating point numbers are treated as if they were zeroes; overflow and underflow are not reported. Ignore characters outside the printable ASCII range 040-0176 octal (inclusive) when sorting. An initial string, consisting of any amount of whitespace, followed by three letters abbreviating a month name, is folded to UPPER case and compared in the order `JAN' < `FEB' < ... < `DEC'. Invalid names compare low to valid names. Sort numerically: the number begins each line; specifically, it consists of optional whitespace, an optional `-' sign, and zero or more digits, optionally followed by a decimal point and zero or more digits. `sort -n' uses what might be considered an unconventional method to compare strings representing floating point numbers. Rather than first converting each string to the C `double' type and then comparing those values, sort aligns the decimal points in the two strings and compares the strings a character at a time. One benefit of using this approach is its speed. In practice this is much more efficient than performing the two corresponding string-to-double (or even string-to-integer) conversions and then comparing doubles. In addition, there is no corresponding loss of precision. Converting each string to `double' before comparison would limit precision to about 16 digits on most systems. Neither a leading `+' nor exponential notation is recognized. To compare such strings numerically, use the `-g' option. Reverse the result of comparison, so that lines with greater key values appear earlier in the output instead of later. Other options are: `-o OUTPUT-FILE' Write output to OUTPUT-FILE instead of standard output. If OUTPUT-FILE is one of the input files, `sort' copies it to a temporary file before sorting and writing the output to OUTPUT-FILE. `-t SEPARATOR' Use character SEPARATOR as the field separator when finding the sort keys in each line. By default, fields are separated by the empty string between a non-whitespace character and a whitespace character. That is, given the input line ` foo bar', `sort' breaks it into fields ` foo' and ` bar'. The field separator is not considered to be part of either the field preceding or the field following. For the default case or the `-m' option, only output the first of a sequence of lines that compare equal. For the `-c' option, check that no pair of consecutive lines compares equal. `-k POS1[,POS2]' The recommended, POSIX, option for specifying a sort field. The field consists of the line between POS1 and POS2 (or the end of the line, if POS2 is omitted), inclusive. Fields and character positions are numbered starting with 1. See below. Treat the input as a set of lines, each terminated by a zero byte (ASCII NUL (Null) character) instead of a ASCII LF (Line Feed.) This option can be useful in conjunction with `perl -0' or `find -print0' and `xargs -0' which do the same in order to reliably handle arbitrary pathnames (even those which contain Line Feed characters.) `+POS1[-POS2]' The obsolete, traditional option for specifying a sort field. The field consists of the line between POS1 and up to but *not including* POS2 (or the end of the line if POS2 is omitted). Fields and character positions are numbered starting with 0. See below. In addition, when GNU `sort' is invoked with exactly one argument, options `--help' and `--version' are recognized. *Note Common options::. Historical (BSD and System V) implementations of `sort' have differed in their interpretation of some options, particularly `-b', `-f', and `-n'. GNU sort follows the POSIX behavior, which is usually (but not always!) like the System V behavior. According to POSIX, `-n' no longer implies `-b'. For consistency, `-M' has been changed in the same way. This may affect the meaning of character positions in field specifications in obscure cases. The only fix is to add an explicit `-b'. A position in a sort field specified with the `-k' or `+' option has the form `F.C', where F is the number of the field to use and C is the number of the first character from the beginning of the field (for `+POS') or from the end of the previous field (for `-POS'). If the `.C' is omitted, it is taken to be the first character in the field. If the `-b' option was specified, the `.C' part of a field specification is counted from the first nonblank character of the field (for `+POS') or from the first nonblank character following the previous field (for `-POS'). A sort key option may also have any of the option letters `Mbdfinr' appended to it, in which case the global ordering options are not used for that particular field. The `-b' option may be independently attached to either or both of the `+POS' and `-POS' parts of a field specification, and if it is inherited from the global options it will be attached to both. If a `-n' or `-M' option is used, thus implying a `-b' option, the `-b' option is taken to apply to both the `+POS' and the `-POS' parts of a key specification. Keys may span multiple fields. Here are some examples to illustrate various combinations of options. In them, the POSIX `-k' option is used to specify sort keys rather than the obsolete `+POS1-POS2' syntax. * Sort in descending (reverse) numeric order. sort -nr Sort alphabetically, omitting the first and second fields. This uses a single key composed of the characters beginning at the start of field three and extending to the end of each line. sort -k3 * Sort numerically on the second field and resolve ties by sorting alphabetically on the third and fourth characters of field five. Use `:' as the field delimiter. sort -t : -k 2,2n -k 5.3,5.4 Note that if you had written `-k 2' instead of `-k 2,2' `sort' would have used all characters beginning in the second field and extending to the end of the line as the primary *numeric* key. For the large majority of applications, treating keys spanning more than one field as numeric will not do what you expect. Also note that the `n' modifier was applied to the field-end specifier for the first key. It would have been equivalent to specify `-k 2n,2' or `-k 2n,2n'. All modifiers except `b' apply to the associated *field*, regardless of whether the modifier character is attached to the field-start and/or the field-end part of the key specifier. * Sort the password file on the fifth field and ignore any leading white space. Sort lines with equal values in field five on the numeric user ID in field three. sort -t : -k 5b,5 -k 3,3n /etc/passwd An alternative is to use the global numeric modifier `-n'. sort -t : -n -k 5b,5 -k 3,3 /etc/passwd * Generate a tags file in case insensitive sorted order. find src -type f -print0 | sort -t / -z -f | xargs -0 etags --append The use of `-print0', `-z', and `-0' in this case mean that pathnames that contain Line Feed characters will not get broken up by the sort operation. Finally, to ignore both leading and trailing white space, you could have applied the `b' modifier to the field-end specifier for the first key, sort -t : -n -k 5b,5b -k 3,3 /etc/passwd or by using the global `-b' modifier instead of `-n' and an explicit `n' with the second key specifier. sort -t : -b -k 5,5 -k 3,3n /etc/passwd File: textutils.info, Node: uniq invocation, Next: comm invocation, Prev: sort invocation, Up: Operating on sorted files `uniq': Uniqify files ===================== `uniq' writes the unique lines in the given `input', or standard input if nothing is given or for an INPUT name of `-'. Synopsis: uniq [OPTION]... [INPUT [OUTPUT]] By default, `uniq' prints the unique lines in a sorted file, i.e., discards all but one of identical successive lines. Optionally, it can instead show only lines that appear exactly once, or lines that appear more than once. The input must be sorted. If your input is not sorted, perhaps you want to use `sort -u'. If no OUTPUT file is specified, `uniq' writes to standard output. The program accepts the following options. Also see *Note Common options::. `-f N' `--skip-fields=N' Skip N fields on each line before checking for uniqueness. Fields are sequences of non-space non-tab characters that are separated from each other by at least one spaces or tabs. `-s N' `--skip-chars=N' Skip N characters before checking for uniqueness. If you use both the field and character skipping options, fields are skipped over first. `--count' Print the number of times each line occurred along with the line. `--ignore-case' Ignore differences in case when comparing lines. `--repeated' Print only duplicate lines. `--unique' Print only unique lines. `-w N' `--check-chars=N' Compare N characters on each line (after skipping any specified fields and characters). By default the entire rest of the lines are compared. File: textutils.info, Node: comm invocation, Prev: uniq invocation, Up: Operating on sorted files `comm': Compare two sorted files line by line ============================================= `comm' writes to standard output lines that are common, and lines that are unique, to two input files; a file name of `-' means standard input. Synopsis: comm [OPTION]... FILE1 FILE2 The input files must be sorted before `comm' can be used. With no options, `comm' produces three column output. Column one contains lines unique to FILE1, column two contains lines unique to FILE2, and column three contains lines common to both files. Columns are separated by TAB. The options `-1', `-2', and `-3' suppress printing of the corresponding columns. Also see *Note Common options::. File: textutils.info, Node: Operating on fields within a line, Next: Operating on characters, Prev: Operating on sorted files, Up: Top Operating on fields within a line ********************************* * Menu: * cut invocation:: Print selected parts of lines. * paste invocation:: Merge lines of files. * join invocation:: Join lines on a common field. File: textutils.info, Node: cut invocation, Next: paste invocation, Up: Operating on fields within a line `cut': Print selected parts of lines ==================================== `cut' writes to standard output selected parts of each line of each input file, or standard input if no files are given or for a file name of `-'. Synopsis: cut [OPTION]... [FILE]... In the table which follows, the BYTE-LIST, CHARACTER-LIST, and FIELD-LIST are one or more numbers or ranges (two numbers separated by a dash) separated by commas. Bytes, characters, and fields are numbered from starting at 1. Incomplete ranges may be given: `-M' means `1-M'; `N-' means `N' through end of line or last field. The program accepts the following options. Also see *Note Common options::. `-b BYTE-LIST' `--bytes=BYTE-LIST' Print only the bytes in positions listed in BYTE-LIST. Tabs and backspaces are treated like any other character; they take up 1 byte. `-c CHARACTER-LIST' `--characters=CHARACTER-LIST' Print only characters in positions listed in CHARACTER-LIST. The same as `-b' for now, but internationalization will change that. Tabs and backspaces are treated like any other character; they take up 1 character. `-f FIELD-LIST' `--fields=FIELD-LIST' Print only the fields listed in FIELD-LIST. Fields are separated by a TAB by default. `-d DELIM' `--delimiter=DELIM' For `-f', fields are separated by the first character in DELIM (default is TAB). Do not split multi-byte characters (no-op for now). `--only-delimited' For `-f', do not print lines that do not contain the field separator character. File: textutils.info, Node: paste invocation, Next: join invocation, Prev: cut invocation, Up: Operating on fields within a line `paste': Merge lines of files ============================= `paste' writes to standard output lines consisting of sequentially corresponding lines of each given file, separated by TAB. Standard input is used for a file name of `-' or if no input files are given. Synopsis: paste [OPTION]... [FILE]... The program accepts the following options. Also see *Note Common options::. `--serial' Paste the lines of one file at a time rather than one line from each file. `-d DELIM-LIST' `--delimiters DELIM-LIST' Consecutively use the characters in DELIM-LIST instead of TAB to separate merged lines. When DELIM-LIST is exhausted, start again at its beginning. File: textutils.info, Node: join invocation, Prev: paste invocation, Up: Operating on fields within a line `join': Join lines on a common field ==================================== `join' writes to standard output a line for each pair of input lines that have identical join fields. Synopsis: join [OPTION]... FILE1 FILE2 Either FILE1 or FILE2 (but not both) can be `-', meaning standard input. FILE1 and FILE2 should be already sorted in increasing order (not numerically) on the join fields; unless the `-t' option is given, they should be sorted ignoring blanks at the start of the join field, as in `sort -b'. If the `--ignore-case' option is given, lines should be sorted without regard to the case of characters in the join field, as in `sort -f'. The defaults are: the join field is the first field in each line; fields in the input are separated by one or more blanks, with leading blanks on the line ignored; fields in the output are separated by a space; each output line consists of the join field, the remaining fields from FILE1, then the remaining fields from FILE2. The program accepts the following options. Also see *Note Common options::. `-a FILE-NUMBER' Print a line for each unpairable line in file FILE-NUMBER (either `1' or `2'), in addition to the normal output. `-e STRING' Replace those output fields that are missing in the input with STRING. `--ignore-case' Ignore differences in case when comparing keys. With this option, the lines of the input files must be ordered in the same way. Use `sort -f' to produce this ordering. `-1 FIELD' `-j1 FIELD' Join on field FIELD (a positive integer) of file 1. `-2 FIELD' `-j2 FIELD' Join on field FIELD (a positive integer) of file 2. `-j FIELD' Equivalent to `-1 FIELD -2 FIELD'. `-o FIELD-LIST...' Construct each output line according to the format in FIELD-LIST. Each element in FIELD-LIST is either the single character `0' or has the form M.N where the file number, M, is `1' or `2' and N is a positive field number. A field specification of `0' denotes the join field. In most cases, the functionality of the `0' field spec may be reproduced using the explicit M.N that corresponds to the join field. However, when printing unpairable lines (using either of the `-a' or `-v' options), there is no way to specify the join field using M.N in FIELD-LIST if there are unpairable lines in both files. To give `join' that functionality, POSIX invented the `0' field specification notation. The elements in FIELD-LIST are separated by commas or blanks. Multiple FIELD-LIST arguments can be given after a single `-o' option; the values of all lists given with `-o' are concatenated together. All output lines - including those printed because of any -a or -v option - are subject to the specified FIELD-LIST. `-t CHAR' Use character CHAR as the input and output field separator. `-v FILE-NUMBER' Print a line for each unpairable line in file FILE-NUMBER (either `1' or `2'), instead of the normal output. In addition, when GNU `join' is invoked with exactly one argument, options `--help' and `--version' are recognized. *Note Common options::. File: textutils.info, Node: Operating on characters, Next: Opening the software toolbox, Prev: Operating on fields within a line, Up: Top Operating on characters *********************** This commands operate on individual characters. * Menu: * tr invocation:: Translate, squeeze, and/or delete characters. * expand invocation:: Convert tabs to spaces. * unexpand invocation:: Convert spaces to tabs. File: textutils.info, Node: tr invocation, Next: expand invocation, Up: Operating on characters `tr': Translate, squeeze, and/or delete characters ================================================== Synopsis: tr [OPTION]... SET1 [SET2] `tr' copies standard input to standard output, performing one of the following operations: * translate, and optionally squeeze repeated characters in the result, * squeeze repeated characters, * delete characters, * delete characters, then squeeze repeated characters from the result. The SET1 and (if given) SET2 arguments define ordered sets of characters, referred to below as SET1 and SET2. These sets are the characters of the input that `tr' operates on. The `--complement' (`-c') option replaces SET1 with its complement (all of the characters that are not in SET1). * Menu: * Character sets:: Specifying sets of characters. * Translating:: Changing one characters to another. * Squeezing:: Squeezing repeats and deleting. * Warnings in tr:: Warning messages. File: textutils.info, Node: Character sets, Next: Translating, Up: tr invocation Specifying sets of characters ----------------------------- The format of the SET1 and SET2 arguments resembles the format of regular expressions; however, they are not regular expressions, only lists of characters. Most characters simply represent themselves in these strings, but the strings can contain the shorthands listed below, for convenience. Some of them can be used only in SET1 or SET2, as noted below. Backslash escapes. A backslash followed by a character not listed below causes an error message. `\a' Control-G, `\b' Control-H, `\f' Control-L, `\n' Control-J, `\r' Control-M, `\t' Control-I, `\v' Control-K, `\OOO' The character with the value given by OOO, which is 1 to 3 octal digits, `\\' A backslash. Ranges. The notation `M-N' expands to all of the characters from M through N, in ascending order. M should collate before N; if it doesn't, an error results. As an example, `0-9' is the same as `0123456789'. Although GNU `tr' does not support the System V syntax that uses square brackets to enclose ranges, translations specified in that format will still work as long as the brackets in STRING1 correspond to identical brackets in STRING2. Repeated characters. The notation `[C*N]' in SET2 expands to N copies of character C. Thus, `[y*6]' is the same as `yyyyyy'. The notation `[C*]' in STRING2 expands to as many copies of C as are needed to make SET2 as long as SET1. If N begins with `0', it is interpreted in octal, otherwise in decimal. Character classes. The notation `[:CLASS:]' expands to all of the characters in the (predefined) class CLASS. The characters expand in no particular order, except for the `upper' and `lower' classes, which expand in ascending order. When the `--delete' (`-d') and `--squeeze-repeats' (`-s') options are both given, any character class can be used in SET2. Otherwise, only the character classes `lower' and `upper' are accepted in SET2, and then only if the corresponding character class (`upper' and `lower', respectively) is specified in the same relative position in SET1. Doing this specifies case conversion. The class names are given below; an error results when an invalid class name is given. `alnum' Letters and digits. `alpha' Letters. `blank' Horizontal whitespace. `cntrl' Control characters. `digit' Digits. `graph' Printable characters, not including space. `lower' Lowercase letters. `print' Printable characters, including space. `punct' Punctuation characters. `space' Horizontal or vertical whitespace. `upper' Uppercase letters. `xdigit' Hexadecimal digits. Equivalence classes. The syntax `[=C=]' expands to all of the characters that are equivalent to C, in no particular order. Equivalence classes are a relatively recent invention intended to support non-English alphabets. But there seems to be no standard way to define them or determine their contents. Therefore, they are not fully implemented in GNU `tr'; each character's equivalence class consists only of that character, which is of no particular use. File: textutils.info, Node: Translating, Next: Squeezing, Prev: Character sets, Up: tr invocation Translating ----------- `tr' performs translation when SET1 and SET2 are both given and the `--delete' (`-d') option is not given. `tr' translates each character of its input that is in SET1 to the corresponding character in SET2. Characters not in SET1 are passed through unchanged. When a character appears more than once in SET1 and the corresponding characters in SET2 are not all the same, only the final one is used. For example, these two commands are equivalent: tr aaa xyz tr a z A common use of `tr' is to convert lowercase characters to uppercase. This can be done in many ways. Here are three of them: tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ tr a-z A-Z tr '[:lower:]' '[:upper:]' When `tr' is performing translation, SET1 and SET2 typically have the same length. If SET1 is shorter than SET2, the extra characters at the end of SET2 are ignored. On the other hand, making SET1 longer than SET2 is not portable; POSIX.2 says that the result is undefined. In this situation, BSD `tr' pads SET2 to the length of SET1 by repeating the last character of SET2 as many times as necessary. System V `tr' truncates SET1 to the length of SET2. By default, GNU `tr' handles this case like BSD `tr'. When the `--truncate-set1' (`-t') option is given, GNU `tr' handles this case like the System V `tr' instead. This option is ignored for operations other than translation. Acting like System V `tr' in this case breaks the relatively common BSD idiom: tr -cs A-Za-z0-9 '\012' because it converts only zero bytes (the first element in the complement of SET1), rather than all non-alphanumerics, to newlines. File: textutils.info, Node: Squeezing, Next: Warnings in tr, Prev: Translating, Up: tr invocation Squeezing repeats and deleting ------------------------------ When given just the `--delete' (`-d') option, `tr' removes any input characters that are in SET1. When given just the `--squeeze-repeats' (`-s') option, `tr' replaces each input sequence of a repeated character that is in SET1 with a single occurrence of that character. When given both `--delete' and `--squeeze-repeats', `tr' first performs any deletions using SET1, then squeezes repeats from any remaining characters using SET2. The `--squeeze-repeats' option may also be used when translating, in which case `tr' first performs translation, then squeezes repeats from any remaining characters using SET2. Here are some examples to illustrate various combinations of options: * Remove all zero bytes: tr -d '\000' * Put all words on lines by themselves. This converts all non-alphanumeric characters to newlines, then squeezes each string of repeated newlines into a single newline: tr -cs '[a-zA-Z0-9]' '[\n*]' * Convert each sequence of repeated newlines to a single newline: tr -s '\n' File: textutils.info, Node: Warnings in tr, Prev: Squeezing, Up: tr invocation Warning messages ---------------- Setting the environment variable `POSIXLY_CORRECT' turns off the following warning and error messages, for strict compliance with POSIX.2. Otherwise, the following diagnostics are issued: 1. When the `--delete' option is given but `--squeeze-repeats' is not, and SET2 is given, GNU `tr' by default prints a usage message and exits, because SET2 would not be used. The POSIX specification says that SET2 must be ignored in this case. Silently ignoring arguments is a bad idea. 2. When an ambiguous octal escape is given. For example, `\400' is actually `\40' followed by the digit `0', because the value 400 octal does not fit into a single byte. GNU `tr' does not provide complete BSD or System V compatibility. For example, it is impossible to disable interpretation of the POSIX constructs `[:alpha:]', `[=c=]', and `[c*10]'. Also, GNU `tr' does not delete zero bytes automatically, unlike traditional Unix versions, which provide no way to preserve zero bytes. File: textutils.info, Node: expand invocation, Next: unexpand invocation, Prev: tr invocation, Up: Operating on characters `expand': Convert tabs to spaces ================================ `expand' writes the contents of each given FILE, or standard input if none are given or for a FILE of `-', to standard output, with tab characters converted to the appropriate number of spaces. Synopsis: expand [OPTION]... [FILE]... By default, `expand' converts all tabs to spaces. It preserves backspace characters in the output; they decrement the column count for tab calculations. The default action is equivalent to `-8' (set tabs every 8 columns). The program accepts the following options. Also see *Note Common options::. `-TAB1[,TAB2]...' `-t TAB1[,TAB2]...' `--tabs=TAB1[,TAB2]...' If only one tab stop is given, set the tabs TAB1 spaces apart (default is 8). Otherwise, set the tabs at columns TAB1, TAB2, ... (numbered from 0), and replace any tabs beyond the last tabstop given with single spaces. If the tabstops are specified with the `-t' or `--tabs' option, they can be separated by blanks as well as by commas. `--initial' Only convert initial tabs (those that precede all non-space or non-tab characters) on each line to spaces. File: textutils.info, Node: unexpand invocation, Prev: expand invocation, Up: Operating on characters `unexpand': Convert spaces to tabs ================================== `unexpand' writes the contents of each given FILE, or standard input if none are given or for a FILE of `-', to standard output, with strings of two or more space or tab characters converted to as many tabs as possible followed by as many spaces as are needed. Synopsis: unexpand [OPTION]... [FILE]... By default, `unexpand' converts only initial spaces and tabs (those that precede all non space or tab characters) on each line. It preserves backspace characters in the output; they decrement the column count for tab calculations. By default, tabs are set at every 8th column. The program accepts the following options. Also see *Note Common options::. `-TAB1[,TAB2]...' `-t TAB1[,TAB2]...' `--tabs=TAB1[,TAB2]...' If only one tab stop is given, set the tabs TAB1 spaces apart instead of the default 8. Otherwise, set the tabs at columns TAB1, TAB2, ... (numbered from 0), and leave spaces and tabs beyond the tabstops given unchanged. If the tabstops are specified with the `-t' or `--tabs' option, they can be separated by blanks as well as by commas. This option implies the `-a' option. `--all' Convert all strings of two or more spaces or tabs, not just initial ones, to tabs. File: textutils.info, Node: Opening the software toolbox, Next: Index, Prev: Operating on characters, Up: Top Opening the software toolbox **************************** This chapter originally appeared in `Linux Journal', volume 1, number 2, in the `What's GNU?' column. It was written by Arnold Robbins. * Menu: * Toolbox introduction:: * I/O redirection:: * The `who' command:: * The `cut' command:: * The `sort' command:: * The `uniq' command:: * Putting the tools together:: File: textutils.info, Node: Toolbox introduction, Next: I/O redirection, Up: Opening the software toolbox Toolbox introduction ==================== This month's column is only peripherally related to the GNU Project, in that it describes a number of the GNU tools on your Linux system and how they might be used. What it's really about is the "Software Tools" philosophy of program development and usage. The software tools philosophy was an important and integral concept in the initial design and development of Unix (of which Linux and GNU are essentially clones). Unfortunately, in the modern day press of Internetworking and flashy GUIs, it seems to have fallen by the wayside. This is a shame, since it provides a powerful mental model for solving many kinds of problems. Many people carry a Swiss Army knife around in their pants pockets (or purse). A Swiss Army knife is a handy tool to have: it has several knife blades, a screwdriver, tweezers, toothpick, nail file, corkscrew, and perhaps a number of other things on it. For the everyday, small miscellaneous jobs where you need a simple, general purpose tool, it's just the thing. On the other hand, an experienced carpenter doesn't build a house using a Swiss Army knife. Instead, he has a toolbox chock full of specialized tools--a saw, a hammer, a screwdriver, a plane, and so on. And he knows exactly when and where to use each tool; you won't catch him hammering nails with the handle of his screwdriver. The Unix developers at Bell Labs were all professional programmers and trained computer scientists. They had found that while a one-size-fits-all program might appeal to a user because there's only one program to use, in practice such programs are a. difficult to write, b. difficult to maintain and debug, and c. difficult to extend to meet new situations. Instead, they felt that programs should be specialized tools. In short, each program "should do one thing well." No more and no less. Such programs are simpler to design, write, and get right--they only do one thing. Furthermore, they found that with the right machinery for hooking programs together, that the whole was greater than the sum of the parts. By combining several special purpose programs, you could accomplish a specific task that none of the programs was designed for, and accomplish it much more quickly and easily than if you had to write a special purpose program. We will see some (classic) examples of this further on in the column. (An important additional point was that, if necessary, take a detour and build any software tools you may need first, if you don't already have something appropriate in the toolbox.) File: textutils.info, Node: I/O redirection, Next: The `who' command, Prev: Toolbox introduction, Up: Opening the software toolbox I/O redirection =============== Hopefully, you are familiar with the basics of I/O redirection in the shell, in particular the concepts of "standard input," "standard output," and "standard error". Briefly, "standard input" is a data source, where data comes from. A program should not need to either know or care if the data source is a disk file, a keyboard, a magnetic tape, or even a punched card reader. Similarly, "standard output" is a data sink, where data goes to. The program should neither know nor care where this might be. Programs that only read their standard input, do something to the data, and then send it on, are called "filters", by analogy to filters in a water pipeline. With the Unix shell, it's very easy to set up data pipelines: program_to_create_data | filter1 | .... | filterN > final.pretty.data We start out by creating the raw data; each filter applies some successive transformation to the data, until by the time it comes out of the pipeline, it is in the desired form. This is fine and good for standard input and standard output. Where does the standard error come in to play? Well, think about `filter1' in the pipeline above. What happens if it encounters an error in the data it sees? If it writes an error message to standard output, it will just disappear down the pipeline into `filter2''s input, and the user will probably never see it. So programs need a place where they can send error messages so that the user will notice them. This is standard error, and it is usually connected to your console or window, even if you have redirected standard output of your program away from your screen. For filter programs to work together, the format of the data has to be agreed upon. The most straightforward and easiest format to use is simply lines of text. Unix data files are generally just streams of bytes, with lines delimited by the ASCII LF (Line Feed) character, conventionally called a "newline" in the Unix literature. (This is `'\n'' if you're a C programmer.) This is the format used by all the traditional filtering programs. (Many earlier operating systems had elaborate facilities and special purpose programs for managing binary data. Unix has always shied away from such things, under the philosophy that it's easiest to simply be able to view and edit your data with a text editor.) OK, enough introduction. Let's take a look at some of the tools, and then we'll see how to hook them together in interesting ways. In the following discussion, we will only present those command line options that interest us. As you should always do, double check your system documentation for the full story. File: textutils.info, Node: The `who' command, Next: The `cut' command, Prev: I/O redirection, Up: Opening the software toolbox The `who' command ================= The first program is the `who' command. By itself, it generates a list of the users who are currently logged in. Although I'm writing this on a single-user system, we'll pretend that several people are logged in: $ who arnold console Jan 22 19:57 miriam ttyp0 Jan 23 14:19(:0.0) bill ttyp1 Jan 21 09:32(:0.0) arnold ttyp2 Jan 23 20:48(:0.0) Here, the `$' is the usual shell prompt, at which I typed `who'. There are three people logged in, and I am logged in twice. On traditional Unix systems, user names are never more than eight characters long. This little bit of trivia will be useful later. The output of `who' is nice, but the data is not all that exciting. File: textutils.info, Node: The `cut' command, Next: The `sort' command, Prev: The `who' command, Up: Opening the software toolbox The `cut' command ================= The next program we'll look at is the `cut' command. This program cuts out columns or fields of input data. For example, we can tell it to print just the login name and full name from the `/etc/passwd file'. The `/etc/passwd' file has seven fields, separated by colons: arnold:xyzzy:2076:10:Arnold D. Robbins:/home/arnold:/bin/ksh To get the first and fifth fields, we would use cut like this: $ cut -d: -f1,5 /etc/passwd root:Operator ... arnold:Arnold D. Robbins miriam:Miriam A. Robbins ... With the `-c' option, `cut' will cut out specific characters (i.e., columns) in the input lines. This command looks like it might be useful for data filtering. File: textutils.info, Node: The `sort' command, Next: The `uniq' command, Prev: The `cut' command, Up: Opening the software toolbox The `sort' command ================== Next we'll look at the `sort' command. This is one of the most powerful commands on a Unix-style system; one that you will often find yourself using when setting up fancy data plumbing. The `sort' command reads and sorts each file named on the command line. It then merges the sorted data and writes it to standard output. It will read standard input if no files are given on the command line (thus making it into a filter). The sort is based on the machine collating sequence (ASCII) or based on user-supplied ordering criteria. File: textutils.info, Node: The `uniq' command, Next: Putting the tools together, Prev: The `sort' command, Up: Opening the software toolbox The `uniq' command ================== Finally (at least for now), we'll look at the `uniq' program. When sorting data, you will often end up with duplicate lines, lines that are identical. Usually, all you need is one instance of each line. This is where `uniq' comes in. The `uniq' program reads its standard input, which it expects to be sorted. It only prints out one copy of each duplicated line. It does have several options. Later on, we'll use the `-c' option, which prints each unique line, preceded by a count of the number of times that line occurred in the input.