home *** CD-ROM | disk | FTP | other *** search
Text File | 1990-06-07 | 48.5 KB | 1,390 lines |
- printf format, "----", "------" @}
- @{ printf format, $1, $2 @}' BBS-list
- @end example
-
- See if you can use the @code{printf} statement to line up the headings and
- table data for our @file{inventory-shipped} example covered earlier in the
- section on the @code{print} statement (@pxref{Print}).
-
- @node Redirection, Special Files, Printf, Printing
- @section Redirecting Output of @code{print} and @code{printf}
-
- @cindex output redirection
- @cindex redirection of output
- So far we have been dealing only with output that prints to the standard
- output, usually your terminal. Both @code{print} and @code{printf} can be
- told to send their output to other places. This is called
- @dfn{redirection}.@refill
-
- A redirection appears after the @code{print} or @code{printf} statement.
- Redirections in @code{awk} are written just like redirections in shell
- commands, except that they are written inside the @code{awk} program.
-
- @menu
- * File/Pipe Redirection:: Redirecting Output to Files and Pipes.
- * Close Output:: How to close output files and pipes.
- @end menu
-
- @node File/Pipe Redirection, Close Output, Redirection, Redirection
- @subsection Redirecting Output to Files and Pipes
-
- Here are the three forms of output redirection. They are all shown for
- the @code{print} statement, but they work identically for @code{printf}
- also.
-
- @table @code
- @item print @var{items} > @var{output-file}
- This type of redirection prints the items onto the output file
- @var{output-file}. The file name @var{output-file} can be any
- expression. Its value is changed to a string and then used as a
- file name (@pxref{Expressions}).@refill
-
- When this type of redirection is used, the @var{output-file} is erased
- before the first output is written to it. Subsequent writes do not
- erase @var{output-file}, but append to it. If @var{output-file} does
- not exist, then it is created.@refill
-
- For example, here is how one @code{awk} program can write a list of
- BBS names to a file @file{name-list} and a list of phone numbers to a
- file @file{phone-list}. Each output file contains one name or number
- per line.
-
- @example
- awk '@{ print $2 > "phone-list"
- print $1 > "name-list" @}' BBS-list
- @end example
-
- @item print @var{items} >> @var{output-file}
- This type of redirection prints the items onto the output file
- @var{output-file}. The difference between this and the
- single-@samp{>} redirection is that the old contents (if any) of
- @var{output-file} are not erased. Instead, the @code{awk} output is
- appended to the file.
-
- @cindex pipes for output
- @cindex output, piping
- @item print @var{items} | @var{command}
- It is also possible to send output through a @dfn{pipe} instead of into a
- file. This type of redirection opens a pipe to @var{command} and writes
- the values of @var{items} through this pipe, to another process created
- to execute @var{command}.@refill
-
- The redirection argument @var{command} is actually an @code{awk}
- expression. Its value is converted to a string, whose contents give the
- shell command to be run.
-
- For example, this produces two files, one unsorted list of BBS names
- and one list sorted in reverse alphabetical order:
-
- @example
- awk '@{ print $1 > "names.unsorted"
- print $1 | "sort -r > names.sorted" @}' BBS-list
- @end example
-
- Here the unsorted list is written with an ordinary redirection while
- the sorted list is written by piping through the @code{sort} utility.
-
- Here is an example that uses redirection to mail a message to a mailing
- list @samp{bug-system}. This might be useful when trouble is encountered
- in an @code{awk} script run periodically for system maintenance.
-
- @example
- print "Awk script failed:", $0 | "mail bug-system"
- print "at record number", FNR, "of", FILENAME | "mail bug-system"
- close("mail bug-system")
- @end example
-
- We call the @code{close} function here because it's a good idea to close
- the pipe as soon as all the intended output has been sent to it.
- @xref{Close Output}, for more information on this.
- @end table
-
- Redirecting output using @samp{>}, @samp{>>}, or @samp{|} asks the system
- to open a file or pipe only if the particular @var{file} or @var{command}
- you've specified has not already been written to by your program.@refill
-
- @node Close Output, , File/Pipe Redirection, Redirection
- @subsection Closing Output Files and Pipes
- @cindex closing output files and pipes
- @findex close
-
- When a file or pipe is opened, the file name or command associated with
- it is remembered by @code{awk} and subsequent writes to the same file or
- command are appended to the previous writes. The file or pipe stays
- open until @code{awk} exits. This is usually convenient.
-
- Sometimes there is a reason to close an output file or pipe earlier
- than that. To do this, use the @code{close} function, as follows:
-
- @example
- close(@var{filename})
- @end example
-
- @noindent
- or
-
- @example
- close(@var{command})
- @end example
-
- The argument @var{filename} or @var{command} can be any expression.
- Its value must exactly equal the string used to open the file or pipe
- to begin with---for example, if you open a pipe with this:
-
- @example
- print $1 | "sort -r > names.sorted"
- @end example
-
- @noindent
- then you must close it with this:
-
- @example
- close("sort -r > names.sorted")
- @end example
-
- Here are some reasons why you might need to close an output file:
-
- @itemize @bullet
- @item
- To write a file and read it back later on in the same @code{awk}
- program. Close the file when you are finished writing it; then
- you can start reading it with @code{getline} (@pxref{Getline}).
-
- @item
- To write numerous files, successively, in the same @code{awk}
- program. If you don't close the files, eventually you will exceed the
- system limit on the number of open files in one process. So close
- each one when you are finished writing it.
-
- @item
- To make a command finish. When you redirect output through a pipe,
- the command reading the pipe normally continues to try to read input
- as long as the pipe is open. Often this means the command cannot
- really do its work until the pipe is closed. For example, if you
- redirect output to the @code{mail} program, the message is not
- actually sent until the pipe is closed.
-
- @item
- To run the same program a second time, with the same arguments.
- This is not the same thing as giving more input to the first run!
-
- For example, suppose you pipe output to the @code{mail} program. If you
- output several lines redirected to this pipe without closing it, they make
- a single message of several lines. By contrast, if you close the pipe
- after each line of output, then each line makes a separate message.
- @end itemize
-
- @node Special Files, , Redirection, Printing
- @section Standard I/O Streams
- @cindex standard input
- @cindex standard output
- @cindex standard error output
- @cindex file descriptors
-
- Running programs conventionally have three input and output streams
- already available to them for reading and writing. These are known as
- the @dfn{standard input}, @dfn{standard output}, and @dfn{standard error
- output}. These streams are, by default, terminal input and output, but
- they are often redirected with the shell, via the @samp{<}, @samp{<<},
- @samp{>}, @samp{>>}, @samp{>&} and @samp{|} operators. Standard error
- is used only for writing error messages; the reason we have two separate
- streams, standard output and standard error, is so that they can be
- redirected separately.
-
- @c @cindex differences between @code{gawk} and @code{awk}
- In other implementations of @code{awk}, the only way to write an error
- message to standard error in an @code{awk} program is as follows:
-
- @example
- print "Serious error detected!\n" | "cat 1>&2"
- @end example
-
- @noindent
- This works by opening a pipeline to a shell command which can access the
- standard error stream which it inherits from the @code{awk} process.
- This is far from elegant, and is also inefficient, since it requires a
- separate process. So people writing @code{awk} programs have often
- neglected to do this. Instead, they have sent the error messages to the
- terminal, like this:
-
- @example
- NF != 4 @{
- printf("line %d skipped: doesn't have 4 fields\n", FNR) > "/dev/tty"
- @}
- @end example
-
- @noindent
- This has the same effect most of the time, but not always: although the
- standard error stream is usually the terminal, it can be redirected, and
- when that happens, writing to the terminal is not correct. In fact, if
- @code{awk} is run from a background job, it may not have a terminal at all.
- Then opening @file{/dev/tty} will fail.
-
- @code{gawk} provides special file names for accessing the three standard
- streams. When you redirect input or output in @code{gawk}, if the file name
- matches one of these special names, then @code{gawk} directly uses the
- stream it stands for.
-
- @cindex @file{/dev/stdin}
- @cindex @file{/dev/stdout}
- @cindex @file{/dev/stderr}
- @cindex @file{/dev/fd/}
- @table @file
- @item /dev/stdin
- The standard input (file descriptor 0).
-
- @item /dev/stdout
- The standard output (file descriptor 1).
-
- @item /dev/stderr
- The standard error output (file descriptor 2).
-
- @item /dev/fd/@var{n}
- The file associated with file descriptor @var{n}. Such a file must have
- been opened by the program initiating the @code{awk} execution (typically
- the shell). Unless you take special pains, only descriptors 0, 1 and 2
- are available.
- @end table
-
- The file names @file{/dev/stdin}, @file{/dev/stdout}, and @file{/dev/stderr}
- are aliases for @file{/dev/fd/0}, @file{/dev/fd/1}, and @file{/dev/fd/2},
- respectively, but they are more self-explanatory.
-
- The proper way to write an error message in a @code{gawk} program
- is to use @file{/dev/stderr}, like this:
-
- @example
- NF != 4 @{
- printf("line %d skipped: doesn't have 4 fields\n", FNR) > "/dev/stderr"
- @}
- @end example
-
- Recognition of these special file names is disabled if @code{gawk} is in
- compatibility mode (@pxref{Command Line}).
-
- @node One-liners, Patterns, Printing, Top
- @chapter Useful ``One-liners''
-
- @cindex one-liners
- Useful @code{awk} programs are often short, just a line or two. Here is a
- collection of useful, short programs to get you started. Some of these
- programs contain constructs that haven't been covered yet. The description
- of the program will give you a good idea of what is going on, but please
- read the rest of the manual to become an @code{awk} expert!
-
- @table @code
- @item awk '@{ num_fields = num_fields + NF @}
- @itemx @ @ @ @ @ END @{ print num_fields @}'
- This program prints the total number of fields in all input lines.
-
- @item awk 'length($0) > 80'
- This program prints every line longer than 80 characters. The sole
- rule has a relational expression as its pattern, and has no action (so the
- default action, printing the record, is used).
-
- @item awk 'NF > 0'
- This program prints every line that has at least one field. This is an
- easy way to delete blank lines from a file (or rather, to create a new
- file similar to the old file but from which the blank lines have been
- deleted).
-
- @item awk '@{ if (NF > 0) print @}'
- This program also prints every line that has at least one field. Here we
- allow the rule to match every line, then decide in the action whether
- to print.
-
- @item awk@ 'BEGIN@ @{@ for (i = 1; i <= 7; i++)
- @itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ print int(101 * rand()) @}'
- This program prints 7 random numbers from 0 to 100, inclusive.
-
- @item ls -l @var{files} | awk '@{ x += $4 @} ; END @{ print "total bytes: " x @}'
- This program prints the total number of bytes used by @var{files}.
-
- @item expand@ @var{file}@ |@ awk@ '@{ if (x < length()) x = length() @}
- @itemx @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ END @{ print "maximum line length is " x @}'
- This program prints the maximum line length of @var{file}. The input
- is piped through the @code{expand} program to change tabs into spaces,
- so the widths compared are actually the right-margin columns.
- @end table
-
- @node Patterns, Actions, One-liners, Top
- @chapter Patterns
- @cindex pattern, definition of
-
- Patterns in @code{awk} control the execution of rules: a rule is
- executed when its pattern matches the current input record. This
- chapter tells all about how to write patterns.
-
- @menu
- * Kinds of Patterns:: A list of all kinds of patterns.
- The following subsections describe them in detail.
-
- * Empty:: The empty pattern, which matches every record.
-
- * Regexp:: Regular expressions such as @samp{/foo/}.
-
- * Comparison Patterns:: Comparison expressions such as @code{$1 > 10}.
-
- * Boolean Patterns:: Combining comparison expressions.
-
- * Expression Patterns:: Any expression can be used as a pattern.
-
- * Ranges:: Using pairs of patterns to specify record ranges.
-
- * BEGIN/END:: Specifying initialization and cleanup rules.
- @end menu
-
- @node Kinds of Patterns, Empty, Patterns, Patterns
- @section Kinds of Patterns
- @cindex patterns, types of
-
- Here is a summary of the types of patterns supported in @code{awk}.
-
- @table @code
- @item /@var{regular expression}/
- A regular expression as a pattern. It matches when the text of the
- input record fits the regular expression. (@xref{Regexp, , Regular
- Expressions as Patterns}.)
-
- @item @var{expression}
- A single expression. It matches when its value, converted to a number,
- is nonzero (if a number) or nonnull (if a string). (@xref{Expression
- Patterns}.)
-
- @item @var{pat1}, @var{pat2}
- A pair of patterns separated by a comma, specifying a range of records.
- (@xref{Ranges, , Specifying Record Ranges With Patterns}.)
-
- @item BEGIN
- @itemx END
- Special patterns to supply start-up or clean-up information to
- @code{awk}. (@xref{BEGIN/END}.)
-
- @item @var{null}
- The empty pattern matches every input record. (@xref{Empty, , The Empty
- Pattern}.)
- @end table
-
- @node Empty, Regexp, Kinds of Patterns, Patterns
- @section The Empty Pattern
-
- @cindex empty pattern
- @cindex pattern, empty
- An empty pattern is considered to match @emph{every} input record. For
- example, the program:@refill
-
- @example
- awk '@{ print $1 @}' BBS-list
- @end example
-
- @noindent
- prints just the first field of every record.
-
- @node Regexp, Comparison Patterns, Empty, Patterns
- @section Regular Expressions as Patterns
- @cindex pattern, regular expressions
- @cindex regexp
- @cindex regular expressions as patterns
-
- A @dfn{regular expression}, or @dfn{regexp}, is a way of describing a
- class of strings. A regular expression enclosed in slashes (@samp{/})
- is an @code{awk} pattern that matches every input record whose text
- belongs to that class.
-
- The simplest regular expression is a sequence of letters, numbers, or
- both. Such a regexp matches any string that contains that sequence.
- Thus, the regexp @samp{foo} matches any string containing @samp{foo}.
- Therefore, the pattern @code{/foo/} matches any input record containing
- @samp{foo}. Other kinds of regexps let you specify more complicated
- classes of strings.
-
- @menu
- * Usage: Regexp Usage. How regexps are used in patterns.
- * Operators: Regexp Operators. How to write a regexp.
- * Case-sensitivity:: How to do case-insensitive matching.
- @end menu
-
- @node Regexp Usage, Regexp Operators, Regexp, Regexp
- @subsection How to Use Regular Expressions
-
- A regular expression can be used as a pattern by enclosing it in
- slashes. Then the regular expression is matched against the entire text
- of each record. (Normally, it only needs to match some part of the text
- in order to succeed.) For example, this prints the second field of each
- record that contains @samp{foo} anywhere:
-
- @example
- awk '/foo/ @{ print $2 @}' BBS-list
- @end example
-
- @cindex regular expression matching operators
- @cindex string-matching operators
- @cindex operators, string-matching
- @cindex operators, regular expression matching
- @cindex regexp search operators
- Regular expressions can also be used in comparison expressions. Then
- you can specify the string to match against; it need not be the entire
- current input record. These comparison expressions can be used as
- patterns or in @code{if} and @code{while} statements.
-
- @table @code
- @item @var{exp} ~ /@var{regexp}/
- This is true if the expression @var{exp} (taken as a character string)
- is matched by @var{regexp}. The following example matches, or selects,
- all input records with the upper-case letter @samp{J} somewhere in the
- first field:@refill
-
- @example
- awk '$1 ~ /J/' inventory-shipped
- @end example
-
- So does this:
-
- @example
- awk '@{ if ($1 ~ /J/) print @}' inventory-shipped
- @end example
-
- @item @var{exp} !~ /@var{regexp}/
- This is true if the expression @var{exp} (taken as a character string)
- is @emph{not} matched by @var{regexp}. The following example matches,
- or selects, all input records whose first field @emph{does not} contain
- the upper-case letter @samp{J}:@refill
-
- @example
- awk '$1 !~ /J/' inventory-shipped
- @end example
- @end table
-
- @cindex computed regular expressions
- @cindex regular expressions, computed
- @cindex dynamic regular expressions
- The right hand side of a @samp{~} or @samp{!~} operator need not be a
- constant regexp (i.e., a string of characters between slashes). It may
- be any expression. The expression is evaluated, and converted if
- necessary to a string; the contents of the string are used as the
- regexp. A regexp that is computed in this way is called a @dfn{dynamic
- regexp}. For example:
-
- @example
- identifier_regexp = "[A-Za-z_][A-Za-z_0-9]+"
- $0 ~ identifier_regexp
- @end example
-
- @noindent
- sets @code{identifier_regexp} to a regexp that describes @code{awk}
- variable names, and tests if the input record matches this regexp.
-
- @node Regexp Operators, Case-sensitivity, Regexp Usage, Regexp
- @subsection Regular Expression Operators
- @cindex metacharacters
- @cindex regular expression metacharacters
-
- You can combine regular expressions with the following characters,
- called @dfn{regular expression operators}, or @dfn{metacharacters}, to
- increase the power and versatility of regular expressions.
-
- Here is a table of metacharacters. All characters not listed in the
- table stand for themselves.
-
- @table @code
- @item ^
- This matches the beginning of the string or the beginning of a line
- within the string. For example:
-
- @example
- ^@@chapter
- @end example
-
- @noindent
- matches the @samp{@@chapter} at the beginning of a string, and can be used
- to identify chapter beginnings in Texinfo source files.
-
- @item $
- This is similar to @samp{^}, but it matches only at the end of a string
- or the end of a line within the string. For example:
-
- @example
- p$
- @end example
-
- @noindent
- matches a record that ends with a @samp{p}.
-
- @item .
- This matches any single character except a newline. For example:
-
- @example
- .P
- @end example
-
- @noindent
- matches any single character followed by a @samp{P} in a string. Using
- concatenation we can make regular expressions like @samp{U.A}, which
- matches any three-character sequence that begins with @samp{U} and ends
- with @samp{A}.
-
- @item [@dots{}]
- This is called a @dfn{character set}. It matches any one of the
- characters that are enclosed in the square brackets. For example:
-
- @example
- [MVX]
- @end example
-
- @noindent
- matches any of the characters @samp{M}, @samp{V}, or @samp{X} in a
- string.@refill
-
- Ranges of characters are indicated by using a hyphen between the beginning
- and ending characters, and enclosing the whole thing in brackets. For
- example:@refill
-
- @example
- [0-9]
- @end example
-
- @noindent
- matches any digit.
-
- To include the character @samp{\}, @samp{]}, @samp{-} or @samp{^} in a
- character set, put a @samp{\} in front of it. For example:
-
- @example
- [d\]]
- @end example
-
- @noindent
- matches either @samp{]}, or @samp{d}.@refill
-
- This treatment of @samp{\} is compatible with other @code{awk}
- implementations but incompatible with the proposed POSIX specification
- for @code{awk}. The current draft specifies the use of the same syntax
- used in @code{egrep}.
-
- We may change @code{gawk} to fit the standard, once we are sure it will
- no longer change. For the meanwhile, the @samp{-a} option specifies the
- traditional @code{awk} syntax described above (which is also the
- default), while the @samp{-e} option specifies @code{egrep} syntax.
- @xref{Options}.
-
- In @code{egrep} syntax, backslash is not syntactically special within
- square brackets. This means that special tricks have to be used to
- represent the characters @samp{]}, @samp{-} and @samp{^} as members of a
- character set.
-
- To match @samp{-}, write it as @samp{---}, which is a range containing
- only @samp{-}. You may also give @samp{-} as the first or last
- character in the set. To match @samp{^}, put it anywhere except as the
- first character of a set. To match a @samp{]}, make it the first
- character in the set. For example:
-
- @example
- []d^]
- @end example
-
- @noindent
- matches either @samp{]}, @samp{d} or @samp{^}.@refill
-
- @item [^ @dots{}]
- This is a @dfn{complemented character set}. The first character after
- the @samp{[} @emph{must} be a @samp{^}. It matches any characters
- @emph{except} those in the square brackets. For example:
-
- @example
- [^0-9]
- @end example
-
- @noindent
- matches any character that is not a digit.
-
- @item |
- This is the @dfn{alternation operator} and it is used to specify
- alternatives. For example:
-
- @example
- ^P|[0-9]
- @end example
-
- @noindent
- matches any string that matches either @samp{^P} or @samp{[0-9]}. This
- means it matches any string that contains a digit or starts with @samp{P}.
-
- The alternation applies to the largest possible regexps on either side.
- @item (@dots{})
- Parentheses are used for grouping in regular expressions as in
- arithmetic. They can be used to concatenate regular expressions
- containing the alternation operator, @samp{|}.
-
- @item *
- This symbol means that the preceding regular expression is to be
- repeated as many times as possible to find a match. For example:
-
- @example
- ph*
- @end example
-
- @noindent
- applies the @samp{*} symbol to the preceding @samp{h} and looks for matches
- to one @samp{p} followed by any number of @samp{h}s. This will also match
- just @samp{p} if no @samp{h}s are present.
-
- The @samp{*} repeats the @emph{smallest} possible preceding expression.
- (Use parentheses if you wish to repeat a larger expression.) It finds
- as many repetitions as possible. For example:
-
- @example
- awk '/\(c[ad][ad]*r x\)/ @{ print @}' sample
- @end example
-
- @noindent
- prints every record in the input containing a string of the form
- @samp{(car x)}, @samp{(cdr x)}, @samp{(cadr x)}, and so on.@refill
-
- @item +
- This symbol is similar to @samp{*}, but the preceding expression must be
- matched at least once. This means that:
-
- @example
- wh+y
- @end example
-
- @noindent
- would match @samp{why} and @samp{whhy} but not @samp{wy}, whereas
- @samp{wh*y} would match all three of these strings. This is a simpler
- way of writing the last @samp{*} example:
-
- @example
- awk '/\(c[ad]+r x\)/ @{ print @}' sample
- @end example
-
- @item ?
- This symbol is similar to @samp{*}, but the preceding expression can be
- matched once or not at all. For example:
-
- @example
- fe?d
- @end example
-
- @noindent
- will match @samp{fed} or @samp{fd}, but nothing else.@refill
-
- @item \
- This is used to suppress the special meaning of a character when
- matching. For example:
-
- @example
- \$
- @end example
-
- @noindent
- matches the character @samp{$}.
-
- The escape sequences used for string constants (@pxref{Constants}) are
- valid in regular expressions as well; they are also introduced by a
- @samp{\}.
- @end table
-
- In regular expressions, the @samp{*}, @samp{+}, and @samp{?} operators have
- the highest precedence, followed by concatenation, and finally by @samp{|}.
- As in arithmetic, parentheses can change how operators are grouped.@refill
-
- @node Case-sensitivity,, Regexp Operators, Regexp
- @subsection Case-sensitivity in Matching
-
- Case is normally significant in regular expressions, both when matching
- ordinary characters (i.e., not metacharacters), and inside character
- sets. Thus a @samp{w} in a regular expression matches only a lower case
- @samp{w} and not an upper case @samp{W}.
-
- The simplest way to do a case-independent match is to use a character
- set: @samp{[Ww]}. However, this can be cumbersome if you need to use it
- often; and it can make the regular expressions harder for humans to
- read. There are two other alternatives that you might prefer.
-
- One way to do a case-insensitive match at a particular point in the
- program is to convert the data to a single case, using the
- @code{tolower} or @code{toupper} built-in string functions (which we
- haven't discussed yet; @pxref{String Functions}). For example:
-
- @example
- tolower($1) ~ /foo/ @{ @dots{} @}
- @end example
-
- @noindent
- converts the first field to lower case before matching against it.
-
- Another method is to set the variable @code{IGNORECASE} to a nonzero
- value (@pxref{Built-in Variables}). When @code{IGNORECASE} is not zero,
- @emph{all} regexp operations ignore case. Changing the value of
- @code{IGNORECASE} dynamically controls the case sensitivity of your
- program as it runs. Case is significant by default because
- @code{IGNORECASE} (like most variables) is initialized to zero.
-
- @example
- x = "aB"
- if (x ~ /ab/) @dots{} # this test will fail
-
- IGNORECASE = 1
- if (x ~ /ab/) @dots{} # now it will succeed
- @end example
-
- You cannot generally use @code{IGNORECASE} to make certain rules
- case-insensitive and other rules case-sensitive, because there is no way
- to set @code{IGNORECASE} just for the pattern of a particular rule. To
- do this, you must use character sets or @code{tolower}. However, one
- thing you can do only with @code{IGNORECASE} is turn case-sensitivity on
- or off dynamically for all the rules at once.
-
- @code{IGNORECASE} can be set on the command line, or in a @code{BEGIN}
- rule. Setting @code{IGNORECASE} from the command line is a way to make
- a program case-insensitive without having to edit it.
-
- The value of @code{IGNORECASE} has no effect if @code{gawk} is in
- compatibility mode (@pxref{Command Line}). Case is always significant
- in compatibility mode.
-
- @node Comparison Patterns, Boolean Patterns, Regexp, Patterns
- @section Comparison Expressions as Patterns
- @cindex comparison expressions as patterns
- @cindex pattern, comparison expressions
- @cindex relational operators
- @cindex operators, relational
-
- @dfn{Comparison patterns} test relationships such as equality between
- two strings or numbers. They are a special case of expression patterns
- (@pxref{Expression Patterns}). They are written with @dfn{relational
- operators}, which are a superset of those in C. Here is a table of
- them:
-
- @table @code
- @item @var{x} < @var{y}
- True if @var{x} is less than @var{y}.
-
- @item @var{x} <= @var{y}
- True if @var{x} is less than or equal to @var{y}.
-
- @item @var{x} > @var{y}
- True if @var{x} is greater than @var{y}.
-
- @item @var{x} >= @var{y}
- True if @var{x} is greater than or equal to @var{y}.
-
- @item @var{x} == @var{y}
- True if @var{x} is equal to @var{y}.
-
- @item @var{x} != @var{y}
- True if @var{x} is not equal to @var{y}.
-
- @item @var{x} ~ @var{y}
- True if @var{x} matches the regular expression described by @var{y}.
-
- @item @var{x} !~ @var{y}
- True if @var{x} does not match the regular expression described by @var{y}.
- @end table
-
- The operands of a relational operator are compared as numbers if they
- are both numbers. Otherwise they are converted to, and compared as,
- strings (@pxref{Conversion}). Strings are compared by comparing the
- first character of each, then the second character of each, and so on,
- until there is a difference. If the two strings are equal until the
- shorter one runs out, the shorter one is considered to be less than the
- longer one. Thus, @code{"10"} is less than @code{"9"}.
-
- The left operand of the @samp{~} and @samp{!~} operators is a string.
- The right operand is either a constant regular expression enclosed in
- slashes (@code{/@var{regexp}/}), or any expression, whose string value
- is used as a dynamic regular expression (@pxref{Regexp Usage}).
-
- The following example prints the second field of each input record
- whose first field is precisely @samp{foo}.
-
- @example
- awk '$1 == "foo" @{ print $2 @}' BBS-list
- @end example
-
- @noindent
- Contrast this with the following regular expression match, which would
- accept any record with a first field that contains @samp{foo}:
-
- @example
- awk '$1 ~ "foo" @{ print $2 @}' BBS-list
- @end example
-
- @noindent
- or, equivalently, this one:
-
- @example
- awk '$1 ~ /foo/ @{ print $2 @}' BBS-list
- @end example
-
- @node Boolean Patterns, Expression Patterns, Comparison Patterns, Patterns
- @section Boolean Operators and Patterns
- @cindex patterns, boolean
- @cindex boolean patterns
-
- A @dfn{boolean pattern} is an expression which combines other patterns
- using the @dfn{boolean operators} ``or'' (@samp{||}), ``and''
- (@samp{&&}), and ``not'' (@samp{!}). Whether the boolean pattern
- matches an input record depends on whether its subpatterns match.
-
- For example, the following command prints all records in the input file
- @file{BBS-list} that contain both @samp{2400} and @samp{foo}.@refill
-
- @example
- awk '/2400/ && /foo/' BBS-list
- @end example
-
- The following command prints all records in the input file
- @file{BBS-list} that contain @emph{either} @samp{2400} or @samp{foo}, or
- both.@refill
-
- @example
- awk '/2400/ || /foo/' BBS-list
- @end example
-
- The following command prints all records in the input file
- @file{BBS-list} that do @emph{not} contain the string @samp{foo}.
-
- @example
- awk '! /foo/' BBS-list
- @end example
-
- Note that boolean patterns are a special case of expression patterns
- (@pxref{Expression Patterns}); they are expressions that use the boolean
- operators. For complete information on the boolean operators, see
- @ref{Boolean Ops}.
-
- The subpatterns of a boolean pattern can be constant regular
- expressions, comparisons, or any other @code{gawk} expressions. Range
- patterns are not expressions, so they cannot appear inside boolean
- patterns. Likewise, the special patterns @code{BEGIN} and @code{END},
- which never match any input record, are not expressions and cannot
- appear inside boolean patterns.
-
- @node Expression Patterns, Ranges, Boolean Patterns, Patterns
- @section Expressions as Patterns
-
- Any @code{awk} expression is valid also as a pattern in @code{gawk}.
- Then the pattern ``matches'' if the expression's value is nonzero (if a
- number) or nonnull (if a string).
-
- The expression is reevaluated each time the rule is tested against a new
- input record. If the expression uses fields such as @code{$1}, the
- value depends directly on the new input record's text; otherwise, it
- depends only on what has happened so far in the execution of the
- @code{awk} program, but that may still be useful.
-
- Comparison patterns are actually a special case of this. For
- example, the expression @code{$5 == "foo"} has the value 1 when the
- value of @code{$5} equals @code{"foo"}, and 0 otherwise; therefore, this
- expression as a pattern matches when the two values are equal.
-
- Boolean patterns are also special cases of expression patterns.
-
- A constant regexp as a pattern is also a special case of an expression
- pattern. @code{/foo/} as an expression has the value 1 if @samp{foo}
- appears in the current input record; thus, as a pattern, @code{/foo/}
- matches any record containing @samp{foo}.
-
- Other implementations of @code{awk} are less general than @code{gawk}:
- they allow comparison expressions, and boolean combinations thereof
- (optionally with parentheses), but not necessarily other kinds of
- expressions.
-
- @node Ranges, BEGIN/END, Expression Patterns, Patterns
- @section Specifying Record Ranges With Patterns
-
- @cindex range pattern
- @cindex patterns, range
- A @dfn{range pattern} is made of two patterns separated by a comma, of
- the form @code{@var{begpat}, @var{endpat}}. It matches ranges of
- consecutive input records. The first pattern @var{begpat} controls
- where the range begins, and the second one @var{endpat} controls where
- it ends. For example,@refill
-
- @example
- awk '$1 == "on", $1 == "off"'
- @end example
-
- @noindent
- prints every record between @samp{on}/@samp{off} pairs, inclusive.
-
- In more detail, a range pattern starts out by matching @var{begpat}
- against every input record; when a record matches @var{begpat}, the
- range pattern becomes @dfn{turned on}. The range pattern matches this
- record. As long as it stays turned on, it automatically matches every
- input record read. But meanwhile, it also matches @var{endpat} against
- every input record, and when that succeeds, the range pattern is turned
- off again for the following record. Now it goes back to checking
- @var{begpat} against each record.
-
- The record that turns on the range pattern and the one that turns it
- off both match the range pattern. If you don't want to operate on
- these records, you can write @code{if} statements in the rule's action
- to distinguish them.
-
- It is possible for a pattern to be turned both on and off by the same
- record, if both conditions are satisfied by that record. Then the action is
- executed for just that record.
-
- @node BEGIN/END,, Ranges, Patterns
- @section @code{BEGIN} and @code{END} Special Patterns
-
- @cindex @code{BEGIN} special pattern
- @cindex patterns, @code{BEGIN}
- @cindex @code{END} special pattern
- @cindex patterns, @code{END}
- @code{BEGIN} and @code{END} are special patterns. They are not used to
- match input records. Rather, they are used for supplying start-up or
- clean-up information to your @code{awk} script. A @code{BEGIN} rule is
- executed, once, before the first input record has been read. An @code{END}
- rule is executed, once, after all the input has been read. For
- example:@refill
-
- @group
- @example
- awk 'BEGIN @{ print "Analysis of `foo'" @}
- /foo/ @{ ++foobar @}
- END @{ print "`foo' appears " foobar " times." @}' BBS-list
- @end example
- @end group
-
- This program finds out how many times the string @samp{foo} appears in
- the input file @file{BBS-list}. The @code{BEGIN} rule prints a title
- for the report. There is no need to use the @code{BEGIN} rule to
- initialize the counter @code{foobar} to zero, as @code{awk} does this
- for us automatically (@pxref{Variables}).
-
- The second rule increments the variable @code{foobar} every time a
- record containing the pattern @samp{foo} is read. The @code{END} rule
- prints the value of @code{foobar} at the end of the run.@refill
-
- The special patterns @code{BEGIN} and @code{END} cannot be used in ranges
- or with boolean operators.
-
- An @code{awk} program may have multiple @code{BEGIN} and/or @code{END}
- rules. They are executed in the order they appear, all the @code{BEGIN}
- rules at start-up and all the @code{END} rules at termination.
-
- Multiple @code{BEGIN} and @code{END} sections are useful for writing
- library functions, since each library can have its own @code{BEGIN} or
- @code{END} rule to do its own initialization and/or cleanup. Note that
- the order in which library functions are named on the command line
- controls the order in which their @code{BEGIN} and @code{END} rules are
- executed. Therefore you have to be careful to write such rules in
- library files so that it doesn't matter what order they are executed in.
- @xref{Command Line}, for more information on using library functions.
-
- If an @code{awk} program only has a @code{BEGIN} rule, and no other
- rules, then the program exits after the @code{BEGIN} rule has been run.
- (Older versions of @code{awk} used to keep reading and ignoring input
- until end of file was seen.) However, if an @code{END} rule exists as
- well, then the input will be read, even if there are no other rules in
- the program. This is necessary in case the @code{END} rule checks the
- @code{NR} variable.
-
- @code{BEGIN} and @code{END} rules must have actions; there is no default
- action for these rules since there is no current record when they run.
-
- @node Actions, Expressions, Patterns, Top
- @chapter Actions: Overview
- @cindex action, definition of
- @cindex curly braces
- @cindex action, curly braces
- @cindex action, separating statements
-
- An @code{awk} @dfn{program} or @dfn{script} consists of a series of
- @dfn{rules} and function definitions, interspersed. (Functions are
- described later; see @ref{User-defined}.)
-
- A rule contains a pattern and an @dfn{action}, either of which may be
- omitted. The purpose of the action is to tell @code{awk} what to do
- once a match for the pattern is found. Thus, the entire program
- looks somewhat like this:
-
- @example
- @r{[}@var{pattern}@r{]} @r{[}@{ @var{action} @}@r{]}
- @r{[}@var{pattern}@r{]} @r{[}@{ @var{action} @}@r{]}
- @dots{}
- function @var{name} (@var{args}) @{ @dots{} @}
- @dots{}
- @end example
-
- An action consists of one or more @code{awk} @dfn{statements}, enclosed
- in curly braces (@samp{@{} and @samp{@}}). Each statement specifies one
- thing to be done. The statements are separated by newlines or
- semicolons.
-
- The curly braces around an action must be used even if the action
- contains only one statement, or even if it contains no statements at
- all. However, if you omit the action entirely, omit the curly braces as
- well. (An omitted action is equivalent to @samp{@{ print $0 @}}.)
-
- Here are the kinds of statement supported in @code{awk}:
-
- @itemize @bullet
- @item
- Expressions, which can call functions or assign values to variables
- (@pxref{Expressions}). Executing this kind of statement simply computes
- the value of the expression and then ignores it. This is useful when
- the expression has side effects (@pxref{Assignment Ops}).
-
- @item
- Control statements, which specify the control flow of @code{awk}
- programs. The @code{awk} language gives you C-like constructs
- (@code{if}, @code{for}, @code{while}, and so on) as well as a few
- special ones (@pxref{Statements}).@refill
-
- @item
- Compound statements, which consist of one or more statements enclosed in
- curly braces. A compound statement is used in order to put several
- statements together in the body of an @code{if}, @code{while}, @code{do}
- or @code{for} statement.
-
- @item
- Input control, using the @code{getline} function (@pxref{Getline}),
- and the @code{next} statement (@pxref{Next Statement}).
-
- @item
- Output statements, @code{print} and @code{printf}. @xref{Printing}.
-
- @item
- Deletion statements, for deleting array elements. @xref{Delete}.
- @end itemize
-
- @iftex
- The next two chapters cover in detail expressions and control
- statements, respectively. We go on to treat arrays, and built-in
- functions, both of which are used in expressions. Then we proceed
- to discuss how to define your own functions.
- @end iftex
-
- @node Expressions, Statements, Actions, Top
- @chapter Actions: Expressions
- @cindex expression
-
- Expressions are the basic building block of @code{awk} actions. An
- expression evaluates to a value, which you can print, test, store in a
- variable or pass to a function.
-
- But, beyond that, an expression can assign a new value to a variable
- or a field, with an assignment operator.
-
- An expression can serve as a statement on its own. Most other kinds of
- statement contain one or more expressions which specify data to be
- operated on. As in other languages, expressions in @code{awk} include
- variables, array references, constants, and function calls, as well as
- combinations of these with various operators.
-
- @menu
- * Constants:: String, numeric, and regexp constants.
- * Variables:: Variables give names to values for later use.
- * Arithmetic Ops:: Arithmetic operations (@samp{+}, @samp{-}, etc.)
- * Concatenation:: Concatenating strings.
- * Comparison Ops:: Comparison of numbers and strings with @samp{<}, etc.
- * Boolean Ops:: Combining comparison expressions using boolean operators
- @samp{||} (``or''), @samp{&&} (``and'') and @samp{!} (``not'').
-
- * Assignment Ops:: Changing the value of a variable or a field.
- * Increment Ops:: Incrementing the numeric value of a variable.
-
- * Conversion:: The conversion of strings to numbers and vice versa.
- * Conditional Exp:: Conditional expressions select between two subexpressions
- under control of a third subexpression.
- * Function Calls:: A function call is an expression.
- * Precedence:: How various operators nest.
- @end menu
-
- @node Constants, Variables, Expressions, Expressions
- @section Constant Expressions
- @cindex constants, types of
- @cindex string constants
-
- The simplest type of expression is the @dfn{constant}, which always has
- the same value. There are three types of constant: numeric constants,
- string constants, and regular expression constants.
-
- @cindex numeric constant
- @cindex numeric value
- A @dfn{numeric constant} stands for a number. This number can be an
- integer, a decimal fraction, or a number in scientific (exponential)
- notation. Note that all numeric values are represented within
- @code{awk} in double-precision floating point. Here are some examples
- of numeric constants, which all have the same value:
-
- @example
- 105
- 1.05e+2
- 1050e-1
- @end example
-
- A string constant consists of a sequence of characters enclosed in
- double-quote marks. For example:
-
- @example
- "parrot"
- @end example
-
- @noindent
- @c @cindex differences between @code{gawk} and @code{awk}
- represents the string whose contents are @samp{parrot}. Strings in
- @code{gawk} can be of any length and they can contain all the possible
- 8-bit ASCII characters including ASCII NUL. Other @code{awk}
- implementations may have difficulty with some character codes.@refill
-
- @cindex escape sequence notation
- Some characters cannot be included literally in a string constant. You
- represent them instead with @dfn{escape sequences}, which are character
- sequences beginning with a backslash (@samp{\}).
-
- One use of an escape sequence is to include a double-quote character in
- a string constant. Since a plain double-quote would end the string, you
- must use @samp{\"} to represent a single double-quote character as a
- part of the string. Backslash itself is another character that can't be
- included normally; you write @samp{\\} to put one backslash in the
- string. Thus, the string whose contents are the two characters
- @samp{"\} must be written @code{"\"\\"}.
-
- Another use of backslash is to represent unprintable characters
- such as newline. While there is nothing to stop you from writing most
- of these characters directly in a string constant, they may look ugly.
-
- Here is a table of all the escape sequences used in @code{awk}:
-
- @table @code
- @item \\
- Represents a literal backslash, @samp{\}.
-
- @item \a
- Represents the ``alert'' character, control-g, ASCII code 7.
-
- @item \b
- Represents a backspace, control-h, ASCII code 8.
-
- @item \f
- Represents a formfeed, control-l, ASCII code 12.
-
- @item \n
- Represents a newline, control-j, ASCII code 10.
-
- @item \r
- Represents a carriage return, control-m, ASCII code 13.
-
- @item \t
- Represents a horizontal tab, control-i, ASCII code 9.
-
- @item \v
- Represents a vertical tab, control-k, ASCII code 11.
-
- @item \@var{nnn}
- Represents the octal value @var{nnn}, where @var{nnn} are one to three
- digits between 0 and 7. For example, the code for the ASCII ESC
- (escape) character is @samp{\033}.@refill
-
- @item \x@var{hh@dots{}}
- Represents the hexadecimal value @var{hh}, where @var{hh} are hexadecimal
- digits (@samp{0} through @samp{9} and either @samp{A} through @samp{F} or
- @samp{a} through @samp{f}). Like the same construct in ANSI C, the escape
- sequence continues until the first non-hexadecimal digit is seen. However,
- using more than two hexadecimal digits produces undefined results.@refill
- @end table
-
- A constant regexp is a regular expression description enclosed in
- slashes, such as @code{/^beginning and end$/}. Most regexps used in
- @code{awk} programs are constant, but the @samp{~} and @samp{!~}
- operators can also match computed or ``dynamic'' regexps (@pxref{Regexp
- Usage}).
-
- Constant regexps are useful only with the @samp{~} and @samp{!~} operators;
- you cannot assign them to variables or print them. They are not truly
- expressions in the usual sense.
-
- @node Variables, Arithmetic Ops, Constants, Expressions
- @section Variables
- @cindex variables, user-defined
- @cindex user-defined variables
-
- Variables let you give names to values and refer to them later. You have
- already seen variables in many of the examples. The name of a variable
- must be a sequence of letters, digits and underscores, but it may not begin
- with a digit. Case is significant in variable names; @code{a} and @code{A}
- are distinct variables.
-
- A variable name is a valid expression by itself; it represents the
- variable's current value. Variables are given new values with
- @dfn{assignment operators} and @dfn{increment operators}.
- @xref{Assignment Ops}.
-
- A few variables have special built-in meanings, such as @code{FS}, the
- field separator, and @code{NF}, the number of fields in the current
- input record. @xref{Built-in Variables}, for a list of them. These
- built-in variables can be used and assigned just like all other
- variables, but their values are also used or changed automatically by
- @code{awk}. Each built-in variable's name is made entirely of upper case
- letters.
-
- Variables in @code{awk} can be assigned either numeric values or string
- values. By default, variables are initialized to the null string, which
- is effectively zero if converted to a number. So there is no need to
- ``initialize'' each variable explicitly in @code{awk}, the way you would
- need to do in C or most other traditional programming languages.
-
- @menu
- * Assignment Options:: Setting variables on the command line and a summary
- of command line syntax. This is an advanced method
- of input.
- @end menu
-
- @node Assignment Options,, Variables, Variables
- @subsection Assigning Variables on the Command Line
-
- You can set any @code{awk} variable by including a @dfn{variable assignment}
- among the arguments on the command line when you invoke @code{awk}
- (@pxref{Command Line}). Such an assignment has this form:
-
- @example
- @var{variable}=@var{text}
- @end example
-
- @noindent
- With it, you can set a variable either at the beginning of the
- @code{awk} run or in between input files.
-
- If you precede the assignment with the @samp{-v} option, like this:
-
- @example
- -v @var{variable}=@var{text}
- @end example
-
- @noindent
- then the variable is set at the very beginning, before even the
- @code{BEGIN} rules are run. The @samp{-v} option and its assignment
- must precede all the file name arguments.
-
- Otherwise, the variable assignment is performed at a time determined by
- its position among the input file arguments: after the processing of the
- preceding input file argument. For example:
-
- @example
- awk '@{ print $n @}' n=4 inventory-shipped n=2 BBS-list
- @end example
-
- @noindent
- prints the value of field number @code{n} for all input records. Before
- the first file is read, the command line sets the variable @code{n}
- equal to 4. This causes the fourth field to be printed in lines from
- the file @file{inventory-shipped}. After the first file has finished,
- but before the second file is started, @code{n} is set to 2, so that the
- second field is printed in lines from @file{BBS-list}.
-
- Command line arguments are made available for explicit examination by
- the @code{awk} program in an array named @code{ARGV} (@pxref{Built-in
- Variables}).
-
- @node Arithmetic Ops, Concatenation, Variables, Expressions
- @section Arithmetic Operators
- @cindex arithmetic operators
- @cindex operators, arithmetic
- @cindex addition
- @cindex subtraction
- @cindex multiplication
- @cindex division
- @cindex remainder
- @cindex quotient
- @cindex exponentiation
-
- The @code{awk} language uses the common arithmetic operators when
- evaluating expressions. All of these arithmetic operators follow normal
- precedence rules, and work as you would expect them to. This example
- divides field three by field four, adds field two, stores the result
- into field one, and prints the resulting altered input record:
-
- @example
- awk '@{ $1 = $2 + $3 / $4; print @}' inventory-shipped
- @end example
-
- The arithmetic operators in @code{awk} are:
-
- @table @code
- @item @var{x} + @var{y}
- Addition.
-
- @item @var{x} - @var{y}
- Subtraction.
-
- @item - @var{x}
- Negation.
-
- @item @var{x} * @var{y}
- Multiplication.
-
- @item @var{x} / @var{y}
- Division. Since all numbers in @code{awk} are double-precision
- floating point, the result is not rounded to an integer: @code{3 / 4}
- has the value 0.75.
-
- @item @var{x} % @var{y}
- @c @cindex differences between @code{gawk} and @code{awk}
- Remainder. The quotient is rounded toward zero to an integer,
- multiplied by @var{y} and this result is subtracted from @var{x}.
- This operation is sometimes known as ``trunc-mod''. The following
- relation always holds:
-
- @example
- b * int(a / b) + (a % b) == a
- @end example
-
- One undesirable effect of this definition of remainder is that
- @code{@var{x} % @var{y}} is negative if @var{x} is negative. Thus,
-
- @example
- -17 % 8 = -1
- @end example
-
- In other @code{awk} implementations, the signedness of the remainder
- may be machine dependent.
-
- @item @var{x} ^ @var{y}
- @itemx @var{x} ** @var{y}
- Exponentiation: @var{x} raised to the @var{y} power. @code{2 ^ 3} has
- the value 8. The character sequence @samp{**} is equivalent to
- @samp{^}.
- @end table
-
- @node Concatenation, Comparison Ops, Arithmetic Ops, Expressions
- @section String Concatenation
-
- @cindex string operators
- @cindex operators, string
- @cindex concatenation
- There is only one string operation: concatenation. It does not have a
- specific operator to represent it. Instead, concatenation is performed by
- writing expressions next to one another, with no operator. For example:
-
- @example
- awk '@{ print "Field number one: " $1 @}' BBS-list
- @end example
-
- @noindent
- produces, for the first record in @file{BBS-list}:
-
- @example
- Field number one: aardvark
- @end example
-
- Without the space in the string constant after the @samp{:}, the line
- would run together. For example:
-
- @example
- awk '@{ print "Field number one:" $1 @}' BBS-list
- @end example
-
- @noindent
- produces, for the first record in @file{BBS-list}:
-
- @example
- Field number one:aardvark
- @end example
-
- Since string concatenation does not have an explicit operator, it is
-