This is Info file textutils.info, produced by Makeinfo-1.64 from the input file /ade-src/fsf/textutils/doc/textutils.texi. START-INFO-DIR-ENTRY * Text utilities: (textutils). GNU text utilities. * cat: (textutils)cat invocation. Concatenate and write files. * cksum: (textutils)cksum invocation. Print POSIX CRC checksum. * comm: (textutils)comm invocation. Compare sorted files by line. * csplit: (textutils)csplit invocation. Split by context. * cut: (textutils)cut invocation. Print selected parts of lines. * expand: (textutils)expand invocation. Convert tabs to spaces. * fmt: (textutils)fmt invocation. Reformat paragraph text. * fold: (textutils)fold invocation. Wrap long input lines. * head: (textutils)head invocation. Output the first part of files. * join: (textutils)join invocation. Join lines on a common field. * md5sum: (textutils)md5sum invocation. Print or check message-digests. * nl: (textutils)nl invocation. Number lines and write files. * od: (textutils)od invocation. Dump files in octal, etc. * paste: (textutils)paste invocation. Merge lines of files. * pr: (textutils)pr invocation. Paginate or columnate files. * sort: (textutils)sort invocation. Sort text files. * split: (textutils)split invocation. Split into fixed-size pieces. * sum: (textutils)sum invocation. Print traditional checksum. * tac: (textutils)tac invocation. Reverse files. * tail: (textutils)tail invocation. Output the last part of files. * tr: (textutils)tr invocation. Translate characters. * unexpand: (textutils)unexpand invocation. Convert spaces to tabs. * uniq: (textutils)uniq invocation. Uniqify files. * wc: (textutils)wc invocation. Byte, word, and line counts. END-INFO-DIR-ENTRY This file documents the GNU text utilities. Copyright (C) 1994, 95, 96 Free Software Foundation, Inc. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Foundation. File: textutils.info, Node: Top, Next: Introduction, Up: (dir) GNU text utilities ****************** This manual minimally documents version 1.19 of the GNU text utilities. * Menu: * Introduction:: Caveats, overview, and authors. * Common options:: Common options. * Output of entire files:: cat tac nl od * Formatting file contents:: fmt pr fold * Output of parts of files:: head tail split csplit * Summarizing files:: wc sum cksum md5sum * Operating on sorted files:: sort uniq comm * Operating on fields within a line:: cut paste join * Operating on characters:: tr expand unexpand * Opening the software toolbox:: The software tools philosophy. * Index:: General index. File: textutils.info, Node: Introduction, Next: Common options, Prev: Top, Up: Top Introduction ************ This manual is incomplete: No attempt is made to explain basic concepts in a way suitable for novices. Thus, if you are interested, please get involved in improving this manual. The entire GNU community will benefit. The GNU text utilities are mostly compatible with the POSIX.2 standard. Please report bugs to `bug-gnu-utils@prep.ai.mit.edu'. Remember to include the version number, machine architecture, input files, and any other information needed to reproduce the bug: your input, what you expected, what you got, and why it is wrong. Diffs are welcome, but please include a description of the problem as well, since this is sometimes difficult to infer. *Note Bugs: (gcc)Bugs. This manual is based on the Unix man pages in the distribution, which were originally written by David MacKenzie and updated by Jim Meyering. The original `fmt' man page was written by Ross Paterson. Franc,ois Pinard did the initial conversion to Texinfo format. Karl Berry did the indexing, some reorganization, and editing of the results. Richard Stallman contributed his usual invaluable insights to the overall process. File: textutils.info, Node: Common options, Next: Output of entire files, Prev: Introduction, Up: Top Common options ************** Certain options are available in all these programs. Rather than writing identical descriptions for each of the programs, they are described here. (In fact, every GNU program accepts (or should accept) these options.) A few of these programs take arbitrary strings as arguments. In those cases, `--help' and `--version' are taken as these options only if there is one and exactly one command line argument. `--help' Print a usage message listing all available options, then exit successfully. `--version' Print the version number, then exit successfully. File: textutils.info, Node: Output of entire files, Next: Formatting file contents, Prev: Common options, Up: Top Output of entire files ********************** These commands read and write entire files, possibly transforming them in some way. * Menu: * cat invocation:: Concatenate and write files. * tac invocation:: Concatenate and write files in reverse. * nl invocation:: Number lines and write files. * od invocation:: Write files in octal or other formats. File: textutils.info, Node: cat invocation, Next: tac invocation, Up: Output of entire files `cat': Concatenate and write files ================================== `cat' copies each FILE (`-' means standard input), or standard input if none are given, to standard output. Synopsis: cat [OPTION] [FILE]... The program accepts the following options. Also see *Note Common options::. `--show-all' Equivalent to `-vET'. `--number-nonblank' Number all nonblank output lines, starting with 1. Equivalent to `-vE'. `--show-ends' Display a `$' after the end of each line. `--number' Number all output lines, starting with 1. `--squeeze-blank' Replace multiple adjacent blank lines with a single blank line. Equivalent to `-vT'. `--show-tabs' Display TAB characters as `^I'. Ignored; for Unix compatibility. `--show-nonprinting' Display control characters except for LFD and TAB using `^' notation and precede characters that have the high bit set with `M-'. File: textutils.info, Node: tac invocation, Next: nl invocation, Prev: cat invocation, Up: Output of entire files `tac': Concatenate and write files in reverse ============================================= `tac' copies each FILE (`-' means standard input), or standard input if none are given, to standard output, reversing the records (lines by default) in each separately. Synopsis: tac [OPTION]... [FILE]... "Records" are separated by instances of a string (newline by default). By default, this separator string is attached to the end of the record that it follows in the file. The program accepts the following options. Also see *Note Common options::. `--before' The separator is attached to the beginning of the record that it precedes in the file. `--regex' Treat the separator string as a regular expression. `-s SEPARATOR' `--separator=SEPARATOR' Use SEPARATOR as the record separator, instead of newline. File: textutils.info, Node: nl invocation, Next: od invocation, Prev: tac invocation, Up: Output of entire files `nl': Number lines and write files ================================== `nl' writes each FILE (`-' means standard input), or standard input if none are given, to standard output, with line numbers added to some or all of the lines. Synopsis: nl [OPTION]... [FILE]... `nl' decomposes its input into (logical) pages; by default, the line number is reset to 1 at the top of each logical page. `nl' treats all of the input files as a single document; it does not reset line numbers or logical pages between files. A logical page consists of three sections: header, body, and footer. Any of the sections can be empty. Each can be numbered in a different style from the others. The beginnings of the sections of logical pages are indicated in the input file by a line containing exactly one of these delimiter strings: `\:\:\:' start of header; `\:\:' start of body; start of footer. The two characters from which these strings are made can be changed from `\' and `:' via options (see below), but the pattern and length of each string cannot be changed. A section delimiter is replaced by an empty line on output. Any text that comes before the first section delimiter string in the input file is considered to be part of a body section, so `nl' treats a file that contains no section delimiters as a single body section. The program accepts the following options. Also see *Note Common options::. `-b STYLE' `--body-numbering=STYLE' Select the numbering style for lines in the body section of each logical page. When a line is not numbered, the current line number is not incremented, but the line number separator character is still prepended to the line. The styles are: `a' number all lines, `t' number only nonempty lines (default for body), `n' do not number lines (default for header and footer), `pREGEXP' number only lines that contain a match for REGEXP. `-d CD' `--section-delimiter=CD' Set the section delimiter characters to CD; default is `\:'. If only C is given, the second remains `:'. (Remember to protect `\' or other metacharacters from shell expansion with quotes or extra backslashes.) `-f STYLE' `--footer-numbering=STYLE' Analogous to `--body-numbering'. `-h STYLE' `--header-numbering=STYLE' Analogous to `--body-numbering'. `-i NUMBER' `--page-increment=NUMBER' Increment line numbers by NUMBER (default 1). `-l NUMBER' `--join-blank-lines=NUMBER' Consider NUMBER (default 1) consecutive empty lines to be one logical line for numbering, and only number the last one. Where fewer than NUMBER consecutive empty lines occur, do not number them. An empty line is one that contains no characters, not even spaces or tabs. `-n FORMAT' `--number-format=FORMAT' Select the line numbering format (default is `rn'): `ln' left justified, no leading zeros; `rn' right justified, no leading zeros; `rz' right justified, leading zeros. `--no-renumber' Do not reset the line number at the start of a logical page. `-s STRING' `--number-separator=STRING' Separate the line number from the text line in the output with STRING (default is TAB). `-v NUMBER' `--starting-line-number=NUMBER' Set the initial line number on each logical page to NUMBER (default 1). `-w NUMBER' `--number-width=NUMBER' Use NUMBER characters for line numbers (default 6). File: textutils.info, Node: od invocation, Prev: nl invocation, Up: Output of entire files `od': Write files in octal or other formats =========================================== `od' writes an unambiguous representation of each FILE (`-' means standard input), or standard input if none are given. Synopsis: od [OPTION]... [FILE]... od -C [FILE] [[+]OFFSET [[+]LABEL]] Each line of output consists of the offset in the input, followed by groups of data from the file. By default, `od' prints the offset in octal, and each group of file data is two bytes of input printed as a single octal number. The program accepts the following options. Also see *Note Common options::. `-A RADIX' `--address-radix=RADIX' Select the base in which file offsets are printed. RADIX can be one of the following: `d' decimal; `o' octal; `x' hexadecimal; `n' none (do not print offsets). The default is octal. `-j BYTES' `--skip-bytes=BYTES' Skip BYTES input bytes before formatting and writing. If BYTES begins with `0x' or `0X', it is interpreted in hexadecimal; otherwise, if it begins with `0', in octal; otherwise, in decimal. Appending `b' multiplies BYTES by 512, `k' by 1024, and `m' by 1048576. `-N BYTES' `--read-bytes=BYTES' Output at most BYTES bytes of the input. Prefixes and suffixes on `bytes' are interpreted as for the `-j' option. `-s [N]' `--strings[=N]' Instead of the normal output, output only "string constants": at least N (3 by default) consecutive ASCII graphic characters, followed by a null (zero) byte. `-t TYPE' `--format=TYPE' Select the format in which to output the file data. TYPE is a string of one or more of the below type indicator characters. If you include more than one type indicator character in a single TYPE string, or use this option more than once, `od' writes one copy of each output line using each of the data types that you specified, in the order that you specified. `a' named character, `c' ASCII character or backslash escape, `d' signed decimal, `f' floating point, `o' octal, `u' unsigned decimal, `x' hexadecimal. The type `a' outputs things like `sp' for space, `nl' for newline, and `nul' for a null (zero) byte. Type `c' outputs ` ', `\n', and `\0', respectively. Except for types `a' and `c', you can specify the number of bytes to use in interpreting each number in the given data type by following the type indicator character with a decimal integer. Alternately, you can specify the size of one of the C compiler's built-in data types by following the type indicator character with one of the following characters. For integers (`d', `o', `u', `x'): `C' char, `S' short, `I' int, `L' long. For floating point (`f'): F float, D double, L long double. `--output-duplicates' Output consecutive lines that are identical. By default, when two or more consecutive output lines would be identical, `od' outputs only the first line, and puts just an asterisk on the following line to indicate the elision. `-w[N]' `--width[=N]' Dump `n' input bytes per output line. This must be a multiple of the least common multiple of the sizes associated with the specified output types. If N is omitted, the default is 32. If this option is not given at all, the default is 16. The next several options map the old, pre-POSIX format specification options to the corresponding POSIX format specs. GNU `od' accepts any combination of old- and new-style options. Format specification options accumulate. Output as named characters. Equivalent to `-ta'. Output as octal bytes. Equivalent to `-toC'. Output as ASCII characters or backslash escapes. Equivalent to `-tc'. Output as unsigned decimal shorts. Equivalent to `-tu2'. Output as floats. Equivalent to `-tfF'. Output as hexadecimal shorts. Equivalent to `-tx2'. Output as decimal shorts. Equivalent to `-td2'. Output as decimal longs. Equivalent to `-td4'. Output as octal shorts. Equivalent to `-to2'. Output as hexadecimal shorts. Equivalent to `-tx2'. `--traditional' Recognize the pre-POSIX non-option arguments that traditional `od' accepted. The following syntax: od --traditional [FILE] [[+]OFFSET[.][b] [[+]LABEL[.][b]]] can be used to specify at most one file and optional arguments specifying an offset and a pseudo-start address, LABEL. By default, OFFSET is interpreted as an octal number specifying how many input bytes to skip before formatting and writing. The optional trailing decimal point forces the interpretation of OFFSET as a decimal number. If no decimal is specified and the offset begins with `0x' or `0X' it is interpreted as a hexadecimal number. If there is a trailing `b', the number of bytes skipped will be OFFSET multiplied by 512. The LABEL argument is interpreted just like OFFSET, but it specifies an initial pseudo-address. The pseudo-addresses are displayed in parentheses following any normal address. File: textutils.info, Node: Formatting file contents, Next: Output of parts of files, Prev: Output of entire files, Up: Top Formatting file contents ************************ These commands reformat the contents of files. * Menu: * fmt invocation:: Reformat paragraph text. * pr invocation:: Paginate or columnate files for printing. * fold invocation:: Wrap input lines to fit in specified width. File: textutils.info, Node: fmt invocation, Next: pr invocation, Up: Formatting file contents `fmt': Reformat paragraph text ============================== `fmt' fills and joins lines to produce output lines of (at most) a given number of characters (75 by default). Synopsis: fmt [OPTION]... [FILE]... `fmt' reads from the specified FILE arguments (or standard input if none are given), and writes to standard output. By default, blank lines, spaces between words, and indentation are preserved in the output; successive input lines with different indentation are not joined; tabs are expanded on input and introduced on output. `fmt' prefers breaking lines at the end of a sentence, and tries to avoid line breaks after the first word of a sentence or before the last word of a sentence. A "sentence break" is defined as either the end of a paragraph or a word ending in any of `.?!', followed by two spaces or end of line, ignoring any intervening parentheses or quotes. Like TeX, `fmt' reads entire "paragraphs" before choosing line breaks; the algorithm is a variant of that in "Breaking Paragraphs Into Lines" (Donald E. Knuth and Michael F. Plass, `Software--Practice and Experience', 11 (1981), 1119-1184). The program accepts the following options. Also see *Note Common options::. `--crown-margin' "Crown margin" mode: preserve the indentation of the first two lines within a paragraph, and align the left margin of each subsequent line with that of the second line. `--tagged-paragraph' "Tagged paragraph" mode: like crown margin mode, except that if indentation of the first line of a paragraph is the same as the indentation of the second, the first line is treated as a one-line paragraph. `--split-only' Split lines only. Do not join short lines to form longer ones. This prevents sample lines of code, and other such "formatted" text from being unduly combined. `--uniform-spacing' Uniform spacing. Reduce spacing between words to one space, and spacing between sentences to two spaces. `-WIDTH' `-w WIDTH' `--width=WIDTH' Fill output lines up to WIDTH characters (default 75). `fmt' initially tries to make lines about 7% shorter than this, to give it room to balance line lengths. `-p PREFIX' `--prefix=PREFIX' Only lines beginning with PREFIX (possibly preceded by whitespace) are subject to formatting. The prefix and any preceding whitespace are stripped for the formatting and then re-attached to each formatted output line. One use is to format certain kinds of program comments, while leaving the code unchanged. File: textutils.info, Node: pr invocation, Next: fold invocation, Prev: fmt invocation, Up: Formatting file contents `pr': Paginate or columnate files for printing ============================================== `pr' writes each FILE (`-' means standard input), or standard input if none are given, to standard output, paginating and optionally outputting in multicolumn format. Synopsis: pr [OPTION]... [FILE]... By default, a 5-line header is printed: two blank lines; a line with the date, the file name, and the page count; and two more blank lines. A five line footer (entirely) is also printed. Form feeds in the input cause page breaks in the output. The program accepts the following options. Also see *Note Common options::. `+PAGE' Begin printing with page PAGE. `-COLUMN' Produce COLUMN-column output and print columns down. The column width is automatically decreased as COLUMN increases; unless you use the `-w' option to increase the page width as well, this option might well cause some input to be truncated. Print columns across rather than down. Balance columns on the last page. Print control characters using hat notation (e.g., `^G'); print other unprintable characters in octal backslash notation. By default, unprintable characters are not changed. Double space the output. `-e[IN-TABCHAR[IN-TABWIDTH]]' Expand tabs to spaces on input. Optional argument IN-TABCHAR is the input tab character (default is TAB). Second optional argument IN-TABWIDTH is the input tab character's width (default is 8). Use a formfeed instead of newlines to separate output pages. `-h HEADER' Replace the file name in the header with the string HEADER. `-i[OUT-TABCHAR[OUT-TABWIDTH]]' Replace spaces with tabs on output. Optional argument OUT-TABCHAR is the output tab character (default is TAB). Second optional argument OUT-TABWIDTH is the output tab character's width (default is 8). `-l N' Set the page length to N (default 66) lines. If N is less than 10, the headers and footers are omitted, as if the `-t' option had been given. Print all files in parallel, one in each column. `-n[NUMBER-SEPARATOR[DIGITS]]' Precede each column with a line number; with parallel files (`-m'), precede each line with a line number. Optional argument NUMBER-SEPARATOR is the character to print after each number (default is TAB). Optional argument DIGITS is the number of digits per line number (default is 5). `-o N' Indent each line with N (default is zero) spaces wide, i.e., set the left margin. The total page width is `n' plus the width set with the `-w' option. Do not print a warning message when an argument FILE cannot be opened. (The exit status will still be nonzero, however.) `-s[C]' Separate columns by the single character C. If C is omitted, the default is space; if this option is omitted altogether, the default is TAB. Do not print the usual 5-line header and the 5-line footer on each page, and do not fill out the bottoms of pages (with blank lines or formfeeds). Print unprintable characters in octal backslash notation. `-w N' Set the page width to N (default is 72) columns. File: textutils.info, Node: fold invocation, Prev: pr invocation, Up: Formatting file contents `fold': Wrap input lines to fit in specified width ================================================== `fold' writes each FILE (`-' means standard input), or standard input if none are given, to standard output, breaking long lines. Synopsis: fold [OPTION]... [FILE]... By default, `fold' breaks lines wider than 80 columns. The output is split into as many lines as necessary. `fold' counts screen columns by default; thus, a tab may count more than one column, backspace decreases the column count, and carriage return sets the column to zero. The program accepts the following options. Also see *Note Common options::. `--bytes' Count bytes rather than columns, so that tabs, backspaces, and carriage returns are each counted as taking up one column, just like other characters. `--spaces' Break at word boundaries: the line is broken after the last blank before the maximum line length. If the line contains no such blanks, the line is broken at the maximum line length as usual. `-w WIDTH' `--width=WIDTH' Use a maximum line length of WIDTH columns instead of 80. File: textutils.info, Node: Output of parts of files, Next: Summarizing files, Prev: Formatting file contents, Up: Top Output of parts of files ************************ These commands output pieces of the input. * Menu: * head invocation:: Output the first part of files. * tail invocation:: Output the last part of files. * split invocation:: Split a file into fixed-size pieces. * csplit invocation:: Split a file into context-determined pieces. File: textutils.info, Node: head invocation, Next: tail invocation, Up: Output of parts of files `head': Output the first part of files ====================================== `head' prints the first part (10 lines by default) of each FILE; it reads from standard input if no files are given or when given a FILE of `-'. Synopses: head [OPTION]... [FILE]... head -NUMBER [OPTION]... [FILE]... If more than one FILE is specified, `head' prints a one-line header consisting of ==> FILE NAME <== before the output for each FILE. `head' accepts two option formats: the new one, in which numbers are arguments to the options (`-q -n 1'), and the old one, in which the number precedes any option letters (`-1q'). The program accepts the following options. Also see *Note Common options::. `-COUNTOPTIONS' This option is only recognized if it is specified first. COUNT is a decimal number optionally followed by a size letter (`b', `k', `m') as in `-c', or `l' to mean count by lines, or other option letters (`cqv'). `-c BYTES' `--bytes=BYTES' Print the first BYTES bytes, instead of initial lines. Appending `b' multiplies BYTES by 512, `k' by 1024, and `m' by 1048576. `-n N' `--lines=N' Output the first N lines. `--quiet' `--silent' Never print file name headers. `--verbose' Always print file name headers. File: textutils.info, Node: tail invocation, Next: split invocation, Prev: head invocation, Up: Output of parts of files `tail': Output the last part of files ===================================== `tail' prints the last part (10 lines by default) of each FILE; it reads from standard input if no files are given or when given a FILE of `-'. Synopses: tail [OPTION]... [FILE]... tail -NUMBER [OPTION]... [FILE]... tail +NUMBER [OPTION]... [FILE]... If more than one FILE is specified, `tail' prints a one-line header consisting of ==> FILE NAME <== before the output for each FILE. GNU `tail' can output any amount of data (some other versions of `tail' cannot). It also has no `-r' option (print in reverse), since reversing a file is really a different job from printing the end of a file; BSD `tail' (which is the one with `-r') can only reverse files that are at most as large as its buffer, which is typically 32k. A more reliable and versatile way to reverse files is the GNU `tac' command. `tail' accepts two option formats: the new one, in which numbers are arguments to the options (`-n 1'), and the old one, in which the number precedes any option letters (`-1' or `+1'). If any option-argument is a number N starting with a `+', `tail' begins printing with the Nth item from the start of each file, instead of from the end. The program accepts the following options. Also see *Note Common options::. `-COUNT' `+COUNT' This option is only recognized if it is specified first. COUNT is a decimal number optionally followed by a size letter (`b', `k', `m') as in `-c', or `l' to mean count by lines, or other option letters (`cfqv'). `-c BYTES' `--bytes=BYTES' Output the last BYTES bytes, instead of final lines. Appending `b' multiplies BYTES by 512, `k' by 1024, and `m' by 1048576. `--follow' Loop forever trying to read more characters at the end of the file, presumably because the file is growing. Ignored if reading from a pipe. If more than one file is given, `tail' prints a header whenever it gets output from a different file, to indicate which file that output is from. `-n N' `--lines=N' Output the last N lines. `-quiet' `--silent' Never print file name headers. `--verbose' Always print file name headers. File: textutils.info, Node: split invocation, Next: csplit invocation, Prev: tail invocation, Up: Output of parts of files `split': Split a file into fixed-size pieces ============================================ `split' creates output files containing consecutive sections of INPUT (standard input if none is given or INPUT is `-'). Synopsis: split [OPTION] [INPUT [PREFIX]] By default, `split' puts 1000 lines of INPUT (or whatever is left over for the last section), into each output file. The output files' names consist of PREFIX (`x' by default) followed by a group of letters `aa', `ab', and so on, such that concatenating the output files in sorted order by file name produces the original input file. (If more than 676 output files are required, `split' uses `zaa', `zab', etc.) The program accepts the following options. Also see *Note Common options::. `-LINES' `-l LINES' `--lines=LINES' Put LINES lines of INPUT into each output file. `-b BYTES' `--bytes=BYTES' Put the first BYTES bytes of INPUT into each output file. Appending `b' multiplies BYTES by 512, `k' by 1024, and `m' by 1048576. `-C BYTES' `--line-bytes=BYTES' Put into each output file as many complete lines of INPUT as possible without exceeding BYTES bytes. For lines longer than BYTES bytes, put BYTES bytes into each output file until less than BYTES bytes of the line are left, then continue normally. BYTES has the same format as for the `--bytes' option. `--verbose=BYTES' Write a diagnostic to standard error just before each output file is opened. File: textutils.info, Node: csplit invocation, Prev: split invocation, Up: Output of parts of files `csplit': Split a file into context-determined pieces ===================================================== `csplit' creates zero or more output files containing sections of INPUT (standard input if INPUT is `-'). Synopsis: csplit [OPTION]... INPUT PATTERN... The contents of the output files are determined by the PATTERN arguments, as detailed below. An error occurs if a PATTERN argument refers to a nonexistent line of the input file (e.g., if no remaining line matches a given regular expression). After every PATTERN has been matched, any remaining input is copied into one last output file. By default, `csplit' prints the number of bytes written to each output file after it has been created. The types of pattern arguments are: Create an output file containing the input up to but not including line N (a positive integer). If followed by a repeat count, also create an output file containing the next LINE lines of the input file once for each repeat. `/REGEXP/[OFFSET]' Create an output file containing the current line up to (but not including) the next line of the input file that contains a match for REGEXP. The optional OFFSET is a `+' or `-' followed by a positive integer. If it is given, the input up to the matching line plus or minus OFFSET is put into the output file, and the line after that begins the next section of input. `%REGEXP%[OFFSET]' Like the previous type, except that it does not create an output file, so that section of the input file is effectively ignored. `{REPEAT-COUNT}' Repeat the previous pattern REPEAT-COUNT additional times. REPEAT-COUNT can either be a positive integer or an asterisk, meaning repeat as many times as necessary until the input is exhausted. The output files' names consist of a prefix (`xx' by default) followed by a suffix. By default, the suffix is an ascending sequence of two-digit decimal numbers from `00' and up to `99'. In any case, concatenating the output files in sorted order by filename produces the original input file. By default, if `csplit' encounters an error or receives a hangup, interrupt, quit, or terminate signal, it removes any output files that it has created so far before it exits. The program accepts the following options. Also see *Note Common options::. `-f PREFIX' `--prefix=PREFIX' Use PREFIX as the output file name prefix. `-b SUFFIX' `--suffix=SUFFIX' Use SUFFIX as the output file name suffix. When this option is specified, the suffix string must include exactly one `printf(3)'-style conversion specification, possibly including format specification flags, a field width, a precision specifications, or all of these kinds of modifiers. The format letter must convert a binary integer argument to readable form; thus, only `d', `i', `u', `o', `x', and `X' conversions are allowed. The entire SUFFIX is given (with the current output file number) to `sprintf(3)' to form the file name suffixes for each of the individual output files in turn. If this option is used, the `--digits' option is ignored. `-n DIGITS' `--digits=DIGITS' Use output file names containing numbers that are DIGITS digits long instead of the default 2. `--keep-files' Do not remove output files when errors are encountered. `--elide-empty-files' Suppress the generation of zero-length output files. (In cases where the section delimiters of the input file are supposed to mark the first lines of each of the sections, the first output file will generally be a zero-length file unless you use this option.) The output file sequence numbers always run consecutively starting from 0, even when this option is specified. `--silent' `--quiet' Do not print counts of output file sizes. File: textutils.info, Node: Summarizing files, Next: Operating on sorted files, Prev: Output of parts of files, Up: Top Summarizing files ***************** These commands generate just a few numbers representing entire contents of files. * Menu: * wc invocation:: Print byte, word, and line counts. * sum invocation:: Print checksum and block counts. * cksum invocation:: Print CRC checksum and byte counts. * md5sum invocation:: Print or check message-digests. File: textutils.info, Node: wc invocation, Next: sum invocation, Up: Summarizing files `wc': Print byte, word, and line counts ======================================= `wc' counts the number of bytes, whitespace-separated words, and newlines in each given FILE, or standard input if none are given or for a FILE of `-'. Synopsis: wc [OPTION]... [FILE]... `wc' prints one line of counts for each file, and if the file was given as an argument, it prints the file name following the counts. If more than one FILE is given, `wc' prints a final line containing the cumulative counts, with the file name `total'. The counts are printed in this order: newlines, words, bytes. By default, `wc' prints all three counts. Options can specify that only certain counts be printed. Options do not undo others previously given, so wc --bytes --words prints both the byte counts and the word counts. The program accepts the following options. Also see *Note Common options::. `--bytes' `--chars' Print only the byte counts. `--words' Print only the word counts. `--lines' Print only the newline counts. File: textutils.info, Node: sum invocation, Next: cksum invocation, Prev: wc invocation, Up: Summarizing files `sum': Print checksum and block counts ====================================== `sum' computes a 16-bit checksum for each given FILE, or standard input if none are given or for a FILE of `-'. Synopsis: sum [OPTION]... [FILE]... `sum' prints the checksum for each FILE followed by the number of blocks in the file (rounded up). If more than one FILE is given, file names are also printed (by default). (With the `--sysv' option, corresponding file name are printed when there is at least one file argument.) By default, GNU `sum' computes checksums using an algorithm compatible with BSD `sum' and prints file sizes in units of 1024-byte blocks. The program accepts the following options. Also see *Note Common options::. Use the default (BSD compatible) algorithm. This option is included for compatibility with the System V `sum'. Unless `-s' was also given, it has no effect. `--sysv' Compute checksums using an algorithm compatible with System V `sum''s default, and print file sizes in units of 512-byte blocks. `sum' is provided for compatibility; the `cksum' program (see next section) is preferable in new applications. File: textutils.info, Node: cksum invocation, Next: md5sum invocation, Prev: sum invocation, Up: Summarizing files `cksum': Print CRC checksum and byte counts =========================================== `cksum' computes a cyclic redundancy check (CRC) checksum for each given FILE, or standard input if none are given or for a FILE of `-'. Synopsis: cksum [OPTION]... [FILE]... `cksum' prints the CRC checksum for each file along with the number of bytes in the file, and the filename unless no arguments were given. `cksum' is typically used to ensure that files transferred by unreliable means (e.g., netnews) have not been corrupted, by comparing the `cksum' output for the received files with the `cksum' output for the original files (typically given in the distribution). The CRC algorithm is specified by the POSIX.2 standard. It is not compatible with the BSD or System V `sum' algorithms (see the previous section); it is more robust. The only options are `--help' and `--version'. *Note Common options::. File: textutils.info, Node: md5sum invocation, Prev: cksum invocation, Up: Summarizing files `md5sum': Print or check message-digests ======================================== `md5sum' computes a 128-bit checksum (or "fingerprint" or "message-digest") for each specified FILE. If a FILE is specified as `-' or if no files are given `md5sum' computes the checksum for the standard input. `md5sum' can also determine whether a file and checksum are consistent. Synopsis: md5sum [OPTION]... [FILE]... md5sum [OPTION]... --check [FILE] md5sum [OPTION]... --string=STRING ... For each FILE, `md5sum' outputs the MD5 checksum, a flag indicating a binary or text input file, and the filename. If FILE is omitted or specified as `-', standard input is read. The program accepts the following options. Also see *Note Common options::. `--binary' Treat all input files as binary. This option has no effect on Unix systems, since they don't distinguish between binary and text files. This option is useful on systems that have different internal and external character representations. `--check' Read filenames and checksum information from the single FILE (or from stdin if no FILE was specified) and report whether each named file and the corresponding checksum data are consistent. The input to this mode of `md5sum' is usually the output of a prior, checksum-generating run of `md5sum'. Each valid line of input consists of an MD5 checksum, a binary/text flag, and then a filename. Binary files are marked with `*', text with ` '. For each such line, `md5sum' reads the named file and computes its MD5 checksum. Then, if the computed message digest does not match the one on the line with the filename, the file is noted as having failed the test. Otherwise, the file passes the test. By default, for each valid line, one line is written to standard output indicating whether the named file passed the test. After all checks have been performed, if there were any failures, a warning is issued to standard error. Use the `--status' option to inhibit that output. If any listed file cannot be opened or read, if any valid line has an MD5 checksum inconsistent with the associated file, or if no valid line is found, `md5sum' exits with nonzero status. Otherwise, it exits successfully. `--status' This option is useful only when verifying checksums. When verifying checksums, don't generate the default one-line-per-file diagnostic and don't output the warning summarizing any failures. Failures to open or read a file still evoke individual diagnostics to standard error. If all listed files are readable and are consistent with the associated MD5 checksums, exit successfully. Otherwise exit with a status code indicating there was a failure. `--string=STRING' Compute the message digest for STRING, instead of for a file. The result is the same as for a file that contains exactly STRING. `--text' Treat all input files as text files. This is the reverse of `--binary'. `--warn' When verifying checksums, warn about improperly formated MD5 checksum lines. This option is useful only if all but a few lines in the checked input are valid. File: textutils.info, Node: Operating on sorted files, Next: Operating on fields within a line, Prev: Summarizing files, Up: Top Operating on sorted files ************************* These commands work with (or produce) sorted files. * Menu: * sort invocation:: Sort text files. * uniq invocation:: Uniqify files. * comm invocation:: Compare two sorted files line by line.