home *** CD-ROM | disk | FTP | other *** search
Text File | 1990-06-07 | 48.5 KB | 1,399 lines |
- the @code{BEGIN} rule was executed. Some applications came to depend
- upon this ``feature''. When @code{awk} was changed to be more consistent,
- the @samp{-v} option was added to accomodate applications that depended
- upon this old behaviour.
-
- The variable assignment feature is most useful for assigning to variables
- such as @code{RS}, @code{OFS}, and @code{ORS}, which control input and
- output formats, before scanning the data files. It is also useful for
- controlling state if multiple passes are needed over a data file. For
- example:@refill
-
- @cindex multiple passes over data
- @cindex passes, multiple
- @example
- awk 'pass == 1 @{ @var{pass 1 stuff} @}
- pass == 2 @{ @var{pass 2 stuff} @}' pass=1 datafile pass=2 datafile
- @end example
-
- @node AWKPATH Variable,, Other Arguments, Command Line
- @section The @code{AWKPATH} Environment Variable
- @cindex @code{AWKPATH} environment variable
- @cindex search path
- @cindex directory search
- @cindex path, search
- @c @cindex differences between @code{gawk} and @code{awk}
-
- The previous section described how @code{awk} program files can be named
- on the command line with the @samp{-f} option. In some @code{awk}
- implementations, you must supply a precise path name for each program
- file, unless the file is in the current directory.
-
- But in @code{gawk}, if the file name supplied in the @samp{-f} option
- does not contain a @samp{/}, then @code{gawk} searches a list of
- directories (called the @dfn{search path}), one by one, looking for a
- file with the specified name.
-
- The search path is actually a string containing directory names
- separated by colons. @code{gawk} gets its search path from the
- @code{AWKPATH} environment variable. If that variable does not exist,
- @code{gawk} uses the default path, which is
- @samp{.:/usr/lib/awk:/usr/local/lib/awk}.@refill
-
- The search path feature is particularly useful for building up libraries
- of useful @code{awk} functions. The library files can be placed in a
- standard directory that is in the default path, and then specified on
- the command line with a short file name. Otherwise, the full file name
- would have to be typed for each file.
-
- Path searching is not done if @code{gawk} is in compatibility mode.
- @xref{Command Line}.
-
- @strong{Note:} if you want files in the current directory to be found,
- you must include the current directory in the path, either by writing
- @file{.} as an entry in the path, or by writing a null entry in the
- path. (A null entry is indicated by starting or ending the path with a
- colon, or by placing two colons next to each other (@samp{::}).) If the
- current directory is not included in the path, then files cannot be
- found in the current directory. This path search mechanism is identical
- to the shell's.
- @c someday, @cite{The Bourne Again Shell}....
-
- @node Language History, Gawk Summary, Command Line, Top
- @chapter The Evolution of the @code{awk} Language
-
- This manual describes the GNU implementation of @code{awk}, which is patterned
- after the System V Release 4 version. Many @code{awk} users are only familiar
- with the original @code{awk} implementation in Version 7 Unix, which is also
- the basis for the version in Berkeley Unix. This chapter briefly describes
- the evolution of the @code{awk} language.
-
- @menu
- * V7/S5R3.1:: The major changes between V7 and System V Release 3.1.
-
- * S5R4:: The minor changes between System V Releases 3.1 and 4.
-
- * S5R4/GNU:: The extensions in @code{gawk} not in System V Release 4.
- @end menu
-
- @node V7/S5R3.1, S5R4, Language History, Language History
- @section Major Changes Between V7 and S5R3.1
-
- The @code{awk} language evolved considerably between the release of
- Version 7 Unix (1978) and the new version first made widely available in
- System V Release 3.1 (1987). This section summarizes the changes, with
- cross-references to further details.
-
- @itemize @bullet
- @item
- The requirement for @samp{;} to separate rules on a line
- (@pxref{Statements/Lines}).
-
- @item
- User-defined functions, and the @code{return} statement
- (@pxref{User-defined}).
-
- @item
- The @code{delete} statement (@pxref{Delete}).
-
- @item
- The @code{do}-@code{while} statement (@pxref{Do Statement}).
-
- @item
- The built-in functions @code{atan2}, @code{cos}, @code{sin}, @code{rand} and
- @code{srand} (@pxref{Numeric Functions}).
-
- @item
- The built-in functions @code{gsub}, @code{sub}, and @code{match}
- (@pxref{String Functions}).
-
- @item
- The built-in functions @code{close} and @code{system} (@pxref{I/O
- Functions}).
-
- @item
- The @code{ARGC}, @code{ARGV}, @code{FNR}, @code{RLENGTH}, @code{RSTART},
- and @code{SUBSEP} built-in variables (@pxref{Built-in Variables}).
-
- @item
- The conditional expression using the operators @samp{?} and @samp{:}
- (@pxref{Conditional Exp}).
-
- @item
- The exponentiation operator @samp{^} (@pxref{Arithmetic Ops}) and its
- assignment operator form @samp{^=} (@pxref{Assignment Ops}).@refill
-
- @item
- C-compatible operator precedence, which breaks some old @code{awk}
- programs (@pxref{Precedence}).
-
- @item
- Regexps as the value of @code{FS} (@pxref{Field Separators}), or as the
- third argument to the @code{split} function (@pxref{String
- Functions}).@refill
-
- @item
- Dynamic regexps as operands of the @samp{~} and @samp{!~} operators
- (@pxref{Regexp Usage}).
-
- @item
- Escape sequences (@pxref{Constants}) in regexps.@refill
-
- @item
- The escape sequences @samp{\b}, @samp{\f}, and @samp{\r}
- (@pxref{Constants}).
-
- @item
- Redirection of input for the @code{getline} function (@pxref{Getline}).
-
- @item
- Multiple @code{BEGIN} and @code{END} rules (@pxref{BEGIN/END}).
-
- @item
- Simulation of multidimensional arrays (@pxref{Multi-dimensional}).
- @end itemize
-
- @node S5R4, S5R4/GNU, V7/S5R3.1, Language History
- @section Minor Changes between S5R3.1 and S5R4
-
- The System V Release 4 version of Unix @code{awk} added these features:
-
- @itemize @bullet
- @item
- The @code{ENVIRON} variable (@pxref{Built-in Variables}).
-
- @item
- Multiple @samp{-f} options on the command line (@pxref{Command Line}).
-
- @item
- The @samp{-v} option for assigning variables before program execution begins
- (@pxref{Command Line}).
-
- @item
- The @samp{--} option for terminating command line options.
-
- @item
- The @samp{\a}, @samp{\v}, and @samp{\x} escape sequences (@pxref{Constants}).
-
- @item
- A defined return value for the @code{srand} built-in function
- (@pxref{Numeric Functions}).
-
- @item
- The @code{toupper} and @code{tolower} built-in string functions
- for case translation (@pxref{String Functions}).
-
- @item
- A cleaner specification for the @samp{%c} format-control letter in the
- @code{printf} function (@pxref{Printf}).
-
- @item
- The use of constant regexps such as @code{/foo/} as expressions, where
- they are equivalent to use of the matching operator, as in @code{$0 ~
- /foo/}.
- @end itemize
-
- @node S5R4/GNU, , S5R4, Language History
- @section Extensions In @code{gawk} Not In S5R4
-
- The GNU implementation, @code{gawk}, adds these features:
-
- @itemize @bullet
- @item
- The @code{AWKPATH} environment variable for specifying a path search for
- the @samp{-f} command line option (@pxref{Command Line}).
-
- @item
- The @samp{-C} and @samp{-V} command line options (@pxref{Command Line}).
-
- @item
- The @code{IGNORECASE} variable and its effects (@pxref{Case-sensitivity}).
-
- @item
- The @file{/dev/stdin}, @file{/dev/stdout}, @file{/dev/stderr}, and
- @file{/dev/fd/@var{n}} file name interpretation (@pxref{Special Files}).
-
- @item
- The @samp{-c} option to turn off these extensions (@pxref{Command Line}).
-
- @item
- The @samp{-a} and @samp{-e} options to specify the syntax of regular
- expressions that @code{gawk} will accept (@pxref{Command Line}).
- @end itemize
-
- @node Gawk Summary, Sample Program, Language History, Top
- @appendix @code{gawk} Summary
-
- @ignore
- See, man pages are good for something. This chapter started life as the
- gawk.1 man page for 2.11.
- @end ignore
-
- This appendix provides a brief summary of the @code{gawk} command line and the
- @code{awk} language. It is designed to serve as ``quick reference.'' It is
- therefore terse, but complete.
-
- @menu
- * Command Line Summary:: Recapitulation of the command line.
- * Language Summary:: A terse review of the language.
- * Variables/Fields:: Variables, fields, and arrays.
- * Rules Summary:: Patterns and Actions, and their component parts.
- * Functions Summary:: Defining and calling functions.
- @end menu
-
- @node Command Line Summary, Language Summary, Gawk Summary, Gawk Summary
- @appendixsec Command Line Options Summary
-
- The command line consists of options to @code{gawk} itself, the
- @code{awk} program text (if not supplied via the @samp{-f} option), and
- values to be made available in the @code{ARGC} and @code{ARGV}
- predefined @code{awk} variables:
-
- @example
- awk @r{[@code{-F@var{fs}}] [@code{-v @var{var}=@var{val}}] [@code{-V}] [@code{-C}] [@code{-c}] [@code{-a}] [@code{-e}] [@code{--}]} '@var{program}' @var{file} @dots{}
- awk @r{[@code{-F@var{fs}}] @code{-f @var{source-file}} [@code{-f @var{source-file} @dots{}}] [@code{-v @var{var}=@var{val}}] [@code{-V}] [@code{-C}] [@code{-c}] [@code{-a}] [@code{-e}] [@code{--}]} @var{file} @dots{}
- @end example
-
- The options that @code{gawk} accepts are:
-
- @table @code
- @item -F@var{fs}
- Use @var{fs} for the input field separator (the value of the @code{FS}
- predefined variable).
-
- @item -f @var{program-file}
- Read the @code{awk} program source from the file @var{program-file}, instead
- of from the first command line argument.
-
- @item -v @var{var}=@var{val}
- Assign the variable @var{var} the value @var{val} before program execution
- begins.
-
- @item -a
- Specifies use of traditional @code{awk} syntax for regular expressions.
- This means that @samp{\} can be used to quote regular expression
- operators inside of square brackets, just as it can be outside of them.
-
- @item -e
- Specifies use of @code{egrep} syntax for regular expressions. This
- means that @samp{\} does not serve as a quoting character inside of
- square brackets.
-
- @item -c
- Specifies compatibility mode, in which @code{gawk} extensions are turned
- off.
-
- @item -V
- Print version information for this particular copy of @code{gawk} on the error
- output. This option may disappear in a future version of @code{gawk}.
-
- @item -C
- Print the short version of the General Public License on the error
- output. This option may disappear in a future version of @code{gawk}.
-
- @item --
- Signal the end of options. This is useful to allow further arguments to the
- @code{awk} program itself to start with a @samp{-}. This is mainly for
- consistency with the argument parsing conventions of POSIX.
- @end table
-
- Any other options are flagged as invalid, but are otherwise ignored.
- @xref{Command Line}, for more details.
-
- @node Language Summary, Variables/Fields, Command Line Summary, Gawk Summary
- @appendixsec Language Summary
-
- An @code{awk} program consists of a sequence of pattern-action statements
- and optional function definitions.
-
- @example
- @var{pattern} @{ @var{action statements} @}
-
- function @var{name}(@var{parameter list}) @{ @var{action statements} @}
- @end example
-
- @code{gawk} first reads the program source from the
- @var{program-file}(s) if specified, or from the first non-option
- argument on the command line. The @samp{-f} option may be used multiple
- times on the command line. @code{gawk} reads the program text from all
- the @var{program-file} files, effectively concatenating them in the
- order they are specified. This is useful for building libraries of
- @code{awk} functions, without having to include them in each new
- @code{awk} program that uses them. To use a library function in a file
- from a program typed in on the command line, specify @samp{-f /dev/tty};
- then type your program, and end it with a @kbd{C-d}. @xref{Command
- Line}.
-
- The environment variable @code{AWKPATH} specifies a search path to use
- when finding source files named with the @samp{-f} option. If the
- variable @code{AWKPATH} is not set, @code{gawk} uses the default path,
- @samp{.:/usr/lib/awk:/usr/local/lib/awk}. If a file name given to the
- @samp{-f} option contains a @samp{/} character, no path search is
- performed. @xref{AWKPATH Variable}, for a full description of the
- @code{AWKPATH} environment variable.@refill
-
- @code{gawk} compiles the program into an internal form, and then proceeds to
- read each file named in the @code{ARGV} array. If there are no files named
- on the command line, @code{gawk} reads the standard input.
-
- If a ``file'' named on the command line has the form
- @samp{@var{var}=@var{val}}, it is treated as a variable assignment: the
- variable @var{var} is assigned the value @var{val}.
-
- For each line in the input, @code{gawk} tests to see if it matches any
- @var{pattern} in the @code{awk} program. For each pattern that the line
- matches, the associated @var{action} is executed.
-
- @node Variables/Fields, Rules Summary, Language Summary, Gawk Summary
- @appendixsec Variables and Fields
-
- @code{awk} variables are dynamic; they come into existence when they are
- first used. Their values are either floating-point numbers or strings.
- @code{awk} also has one-dimension arrays; multiple-dimensional arrays
- may be simulated. There are several predefined variables that
- @code{awk} sets as a program runs; these are summarized below.
-
- @menu
- * Fields Summary:: Input field splitting.
- * Built-in Summary:: @code{awk}'s built-in variables.
- * Arrays Summary:: Using arrays.
- * Data Type Summary:: Values in @code{awk} are numbers or strings.
- @end menu
-
- @node Fields Summary, Built-in Summary, Variables/Fields, Variables/Fields
- @appendixsubsec Fields
-
- As each input line is read, @code{gawk} splits the line into
- @var{fields}, using the value of the @code{FS} variable as the field
- separator. If @code{FS} is a single character, fields are separated by
- that character. Otherwise, @code{FS} is expected to be a full regular
- expression. In the special case that @code{FS} is a single blank,
- fields are separated by runs of blanks and/or tabs. Note that the value
- of @code{IGNORECASE} (@pxref{Case-sensitivity}) also affects how fields
- are split when @code{FS} is a regular expression.
-
- Each field in the input line may be referenced by its position, @code{$1},
- @code{$2}, and so on. @code{$0} is the whole line. The value of a field may
- be assigned to as well. Field numbers need not be constants:
-
- @example
- n = 5
- print $n
- @end example
-
- @noindent
- prints the fifth field in the input line. The variable @code{NF} is set to
- the total number of fields in the input line.
-
- References to nonexistent fields (i.e., fields after @code{$NF}) return
- the null-string. However, assigning to a nonexistent field (e.g.,
- @code{$(NF+2) = 5}) increases the value of @code{NF}, creates any
- intervening fields with the null string as their value, and causes the
- value of @code{$0} to be recomputed, with the fields being separated by
- the value of @code{OFS}.@refill
-
- @xref{Reading Files}, for a full description of the way @code{awk} defines
- and uses fields.
-
- @node Built-in Summary, Arrays Summary, Fields Summary, Variables/Fields
- @appendixsubsec Built-in Variables
-
- @code{awk}'s built-in variables are:
-
- @table @code
- @item ARGC
- The number of command line arguments (not including options or the
- @code{awk} program itself).
-
- @item ARGV
- The array of command line arguments. The array is indexed from 0 to
- @code{ARGC} - 1. Dynamically changing the contents of @code{ARGV} can control
- the files used for data.@refill
-
- @item ENVIRON
- An array containing the values of the environment variables. The array
- is indexed by variable name, each element being the value of that
- variable. Thus, the environment variable @code{HOME} would be in
- @code{ENVIRON["HOME"]}. Its value might be @file{/u/close}.
-
- Changing this array does not affect the environment seen by programs
- which @code{gawk} spawns via redirection or the @code{system} function.
- (This may change in a future version of @code{gawk}.)
-
- Some operating systems do not have environment variables.
- The array @code{ENVIRON} is empty when running on these systems.
-
- @item FILENAME
- The name of the current input file. If no files are specified on the command
- line, the value of @code{FILENAME} is @samp{-}.
-
- @item FNR
- The input record number in the current input file.
-
- @item FS
- The input field separator, a blank by default.
-
- @item IGNORECASE
- The case-sensitivity flag for regular expression operations. If
- @code{IGNORECASE} has a nonzero value, then pattern matching in rules,
- field splitting with @code{FS}, regular expression matching with
- @samp{~} and @samp{!~}, and the @code{gsub}, @code{index}, @code{match},
- @code{split} and @code{sub} predefined functions all ignore case
- when doing regular expression operations.@refill
-
- @item NF
- The number of fields in the current input record.
-
- @item NR
- The total number of input records seen so far.
-
- @item OFMT
- The output format for numbers, @code{"%.6g"} by default.
-
- @item OFS
- The output field separator, a blank by default.
-
- @item ORS
- The output record separator, by default a newline.
-
- @item RS
- The input record separator, by default a newline. @code{RS} is exceptional
- in that only the first character of its string value is used for separating
- records. If @code{RS} is set to the null string, then records are separated by
- blank lines. When @code{RS} is set to the null string, then the newline
- character always acts as a field separator, in addition to whatever value
- @code{FS} may have.@refill
-
- @item RSTART
- The index of the first character matched by @code{match}; 0 if no match.
-
- @item RLENGTH
- The length of the string matched by @code{match}; @minus{}1 if no match.
-
- @item SUBSEP
- The string used to separate multiple subscripts in array elements, by
- default @code{"\034"}.
- @end table
-
- @xref{Built-in Variables}.
-
- @node Arrays Summary, Data Type Summary, Built-in Summary, Variables/Fields
- @appendixsubsec Arrays
-
- Arrays are subscripted with an expression between square brackets
- (@samp{[} and @samp{]}). The expression may be either a number or
- a string. Since arrays are associative, string indices are meaningful
- and are not converted to numbers.
-
- If you use multiple expressions separated by commas inside the square
- brackets, then the array subscript is a string consisting of the
- concatenation of the individual subscript values, converted to strings,
- separated by the subscript separator (the value of @code{SUBSEP}).
-
- The special operator @code{in} may be used in an @code{if} or
- @code{while} statement to see if an array has an index consisting of a
- particular value.
-
- @group
- @example
- if (val in array)
- print array[val]
- @end example
- @end group
-
- If the array has multiple subscripts, use @code{(i, j, @dots{}) in array}
- to test for existence of an element.
-
- The @code{in} construct may also be used in a @code{for} loop to iterate
- over all the elements of an array. @xref{Scanning an Array}.
-
- An element may be deleted from an array using the @code{delete} statement.
-
- @xref{Arrays}, for more detailed information.
-
- @node Data Type Summary, , Arrays Summary, Variables/Fields
- @appendixsubsec Data Types
-
- The value of an @code{awk} expression is always either a number
- or a string.
-
- Certain contexts (such as arithmetic operators) require numeric
- values. They convert strings to numbers by interpreting the text
- of the string as a numeral. If the string does not look like a
- numeral, it converts to 0.
-
- Certain contexts (such as concatenation) require string values.
- They convert numbers to strings by effectively printing them.
-
- To force conversion of a string value to a number, simply add 0
- to it. If the value you start with is already a number, this
- does not change it.
-
- To force conversion of a numeric value to a string, concatenate it with
- the null string.
-
- The @code{awk} language defines comparisons as being done numerically if
- possible, otherwise one or both operands are converted to strings and
- a string comparison is performed.
-
- Uninitialized variables have the string value @code{""} (the null, or
- empty, string). In contexts where a number is required, this is
- equivalent to 0.
-
- @xref{Variables}, for more information on variable naming and initialization;
- @pxref{Conversion}, for more information on how variable values are
- interpreted.@refill
-
- @node Rules Summary, Functions Summary, Variables/Fields, Gawk Summary
- @appendixsec Patterns and Actions
-
- @menu
- * Pattern Summary:: Quick overview of patterns.
- * Regexp Summary:: Quick overview of regular expressions.
- * Actions Summary:: Quick overview of actions.
- @end menu
-
- An @code{awk} program is mostly composed of rules, each consisting of a
- pattern followed by an action. The action is enclosed in @samp{@{} and
- @samp{@}}. Either the pattern may be missing, or the action may be
- missing, but, of course, not both. If the pattern is missing, the
- action is executed for every single line of input. A missing action is
- equivalent to this action,
-
- @example
- @{ print @}
- @end example
-
- @noindent
- which prints the entire line.
-
- Comments begin with the @samp{#} character, and continue until the end of the
- line. Blank lines may be used to separate statements. Normally, a statement
- ends with a newline, however, this is not the case for lines ending in a
- @samp{,}, @samp{@{}, @samp{?}, @samp{:}, @samp{&&}, or @samp{||}. Lines
- ending in @code{do} or @code{else} also have their statements automatically
- continued on the following line. In other cases, a line can be continued by
- ending it with a @samp{\}, in which case the newline is ignored.@refill
-
- Multiple statements may be put on one line by separating them with a @samp{;}.
- This applies to both the statements within the action part of a rule (the
- usual case), and to the rule statements themselves.
-
- @xref{Comments}, for information on @code{awk}'s commenting convention;
- @pxref{Statements/Lines}, for a description of the line continuation
- mechanism in @code{awk}.
-
- @node Pattern Summary, Regexp Summary, Rules Summary, Rules Summary
- @appendixsubsec Patterns
-
- @code{awk} patterns may be one of the following:
-
- @example
- /@var{regular expression}/
- @var{relational expression}
- @var{pattern} && @var{pattern}
- @var{pattern} || @var{pattern}
- @var{pattern} ? @var{pattern} : @var{pattern}
- (@var{pattern})
- ! @var{pattern}
- @var{pattern1}, @var{pattern2}
- BEGIN
- END
- @end example
-
- @code{BEGIN} and @code{END} are two special kinds of patterns that are not
- tested against the input. The action parts of all @code{BEGIN} rules are
- merged as if all the statements had been written in a single @code{BEGIN}
- rule. They are executed before any of the input is read. Similarly, all the
- @code{END} rules are merged, and executed when all the input is exhausted (or
- when an @code{exit} statement is executed). @code{BEGIN} and @code{END}
- patterns cannot be combined with other patterns in pattern expressions.
- @code{BEGIN} and @code{END} rules cannot have missing action parts.@refill
-
- For @samp{/@var{regular-expression}/} patterns, the associated statement is
- executed for each input line that matches the regular expression. Regular
- expressions are the same as those in @code{egrep}, and are summarized below.
-
- A @var{relational expression} may use any of the operators defined below in
- the section on actions. These generally test whether certain fields match
- certain regular expressions.
-
- The @samp{&&}, @samp{||}, and @samp{!} operators are logical ``and'',
- logical ``or'', and logical ``not'', respectively, as in C. They do
- short-circuit evaluation, also as in C, and are used for combining more
- primitive pattern expressions. As in most languages, parentheses may be
- used to change the order of evaluation.
-
- The @samp{?:} operator is like the same operator in C. If the first
- pattern matches, then the second pattern is matched against the input
- record; otherwise, the third is matched. Only one of the second and
- third patterns is matched.
-
- The @samp{@var{pattern1}, @var{pattern2}} form of a pattern is called a
- range pattern. It matches all input lines starting with a line that
- matches @var{pattern1}, and continuing until a line that matches
- @var{pattern2}, inclusive. A range pattern cannot be used as an operand
- to any of the pattern operators.
-
- @xref{Patterns}, for a full description of the pattern part of @code{awk}
- rules.
-
- @node Regexp Summary, Actions Summary, Pattern Summary, Rules Summary
- @appendixsubsec Regular Expressions
-
- Regular expressions are the extended kind found in @code{egrep}.
- They are composed of characters as follows:
-
- @table @code
- @item @var{c}
- matches the character @var{c} (assuming @var{c} is a character with no
- special meaning in regexps).
-
- @item \@var{c}
- matches the literal character @var{c}.
-
- @item .
- matches any character except newline.
-
- @item ^
- matches the beginning of a line or a string.
-
- @item $
- matches the end of a line or a string.
-
- @item [@var{abc}@dots{}]
- matches any of the characters @var{abc}@dots{} (character class).
-
- @item [^@var{abc}@dots{}]
- matches any character except @var{abc}@dots{} and newline (negated
- character class).
-
- @item @var{r1}|@var{r2}
- matches either @var{r1} or @var{r2} (alternation).
-
- @item @var{r1r2}
- matches @var{r1}, and then @var{r2} (concatenation).
-
- @item @var{r}+
- matches one or more @var{r}'s.
-
- @item @var{r}*
- matches zero or more @var{r}'s.
-
- @item @var{r}?
- matches zero or one @var{r}'s.
-
- @item (@var{r})
- matches @var{r} (grouping).
- @end table
-
- @xref{Regexp}, for a more detailed explanation of regular expressions.
-
- The escape sequences allowed in string constants are also valid in
- regular expressions (@pxref{Constants}).
-
- @node Actions Summary, , Regexp Summary, Rules Summary
- @appendixsubsec Actions
-
- Action statements are enclosed in braces, @samp{@{} and @samp{@}}.
- Action statements consist of the usual assignment, conditional, and looping
- statements found in most languages. The operators, control statements,
- and input/output statements available are patterned after those in C.
-
- @menu
- * Operator Summary:: @code{awk} operators.
- * Control Flow Summary:: The control statements.
- * I/O Summary:: The I/O statements.
- * Printf Summary:: A summary of @code{printf}.
- * Special File Summary:: Special file names interpreted internally.
- * Numeric Functions Summary:: Built-in numeric functions.
- * String Functions Summary:: Built-in string functions.
- * String Constants Summary:: Escape sequences in strings.
- @end menu
-
- @node Operator Summary, Control Flow Summary, Actions Summary, Actions Summary
- @appendixsubsubsec Operators
-
- The operators in @code{awk}, in order of increasing precedence, are
-
- @table @code
- @item = += -= *= /= %= ^=
- Assignment. Both absolute assignment (@code{@var{var}=@var{value}})
- and operator assignment (the other forms) are supported.
-
- @item ?:
- A conditional expression, as in C. This has the form @code{@var{expr1} ?
- @var{expr2} : @var{expr3}}. If @var{expr1} is true, the value of the
- expression is @var{expr2}; otherwise it is @var{expr3}. Only one of
- @var{expr2} and @var{expr3} is evaluated.@refill
-
- @item ||
- Logical ``or''.
-
- @item &&
- Logical ``and''.
-
- @item ~ !~
- Regular expression match, negated match.
-
- @item < <= > >= != ==
- The usual relational operators.
-
- @item @var{blank}
- String concatenation.
-
- @item + -
- Addition and subtraction.
-
- @item * / %
- Multiplication, division, and modulus.
-
- @item + - !
- Unary plus, unary minus, and logical negation.
-
- @item ^
- Exponentiation (@samp{**} may also be used, and @samp{**=} for the assignment
- operator).
-
- @item ++ --
- Increment and decrement, both prefix and postfix.
-
- @item $
- Field reference.
- @end table
-
- @xref{Expressions}, for a full description of all the operators listed
- above. @xref{Fields}, for a description of the field reference operator.
-
- @node Control Flow Summary, I/O Summary, Operator Summary, Actions Summary
- @appendixsubsubsec Control Statements
-
- The control statements are as follows:
-
- @example
- if (@var{condition}) @var{statement} @r{[} else @var{statement} @r{]}
- while (@var{condition}) @var{statement}
- do @var{statement} while (@var{condition})
- for (@var{expr1}; @var{expr2}; @var{expr3}) @var{statement}
- for (@var{var} in @var{array}) @var{statement}
- break
- continue
- delete @var{array}[@var{index}]
- exit @r{[} @var{expression} @r{]}
- @{ @var{statements} @}
- @end example
-
- @xref{Statements}, for a full description of all the control statements
- listed above.
-
- @node I/O Summary, Printf Summary, Control Flow Summary, Actions Summary
- @appendixsubsubsec I/O Statements
-
- The input/output statements are as follows:
-
- @table @code
- @item getline
- Set @code{$0} from next input record; set @code{NF}, @code{NR}, @code{FNR}.
-
- @item getline <@var{file}
- Set @code{$0} from next record of @var{file}; set @code{NF}.
-
- @item getline @var{var}
- Set @var{var} from next input record; set @code{NF}, @code{FNR}.
-
- @item getline @var{var} <@var{file}
- Set @var{var} from next record of @var{file}.
-
- @item next
- Stop processing the current input record. The next input record is read and
- processing starts over with the first pattern in the @code{awk} program.
- If the end of the input data is reached, the @code{END} rule(s), if any,
- are executed.
-
- @item print
- Prints the current record.
-
- @item print @var{expr-list}
- Prints expressions.
-
- @item print @var{expr-list} > @var{file}
- Prints expressions on @var{file}.
-
- @item printf @var{fmt, expr-list}
- Format and print.
-
- @item printf @var{fmt, expr-list} > file
- Format and print on @var{file}.
- @end table
-
- Other input/output redirections are also allowed. For @code{print} and
- @code{printf}, @samp{>> @var{file}} appends output to the @var{file},
- while @samp{| @var{command}} writes on a pipe. In a similar fashion,
- @samp{@var{command} | getline} pipes input into @code{getline}.
- @code{getline} returns 0 on end of file, and @minus{}1 on an error.@refill
-
- @xref{Getline}, for a full description of the @code{getline} statement.
- @xref{Printing}, for a full description of @code{print} and
- @code{printf}. Finally, @pxref{Next Statement}, for a description of
- how the @code{next} statement works.@refill
-
- @node Printf Summary, Special File Summary, I/O Summary, Actions Summary
- @appendixsubsubsec @code{printf} Summary
-
- The @code{awk} @code{printf} statement and @code{sprintf} function
- accept the following conversion specification formats:
-
- @table @code
- @item %c
- An ASCII character. If the argument used for @samp{%c} is numeric, it is
- treated as a character and printed. Otherwise, the argument is assumed to
- be a string, and the only first character of that string is printed.
-
- @item %d
- A decimal number (the integer part).
-
- @item %i
- Also a decimal integer.
-
- @item %e
- A floating point number of the form
- @samp{@r{[}-@r{]}d.ddddddE@r{[}+-@r{]}dd}.@refill
-
- @item %f
- A floating point number of the form
- @r{[}@code{-}@r{]}@code{ddd.dddddd}.
-
- @item %g
- Use @samp{%e} or @samp{%f} conversion, whichever is shorter, with
- nonsignificant zeros suppressed.
-
- @item %o
- An unsigned octal number (again, an integer).
-
- @item %s
- A character string.
-
- @item %x
- An unsigned hexadecimal number (an integer).
-
- @item %X
- Like @samp{%x}, except use @samp{A} through @samp{F} instead of @samp{a}
- through @samp{f} for decimal 10 through 15.@refill
-
- @item %%
- A single @samp{%} character; no argument is converted.
- @end table
-
- There are optional, additional parameters that may lie between the @samp{%}
- and the control letter:
-
- @table @code
- @item -
- The expression should be left-justified within its field.
-
- @item @var{width}
- The field should be padded to this width. If @var{width} has a leading zero,
- then the field is padded with zeros. Otherwise it is padded with blanks.
-
- @item .@var{prec}
- A number indicating the maximum width of strings or digits to the right
- of the decimal point.
- @end table
-
- @xref{Printf}, for examples and for a more detailed description.
-
- @node Special File Summary, Numeric Functions Summary, Printf Summary, Actions Summary
- @appendixsubsubsec Special File Names
-
- When doing I/O redirection from either @code{print} or @code{printf} into a
- file, or via @code{getline} from a file, @code{gawk} recognizes certain special
- file names internally. These file names allow access to open file descriptors
- inherited from @code{gawk}'s parent process (usually the shell). The
- file names are:
-
- @table @file
- @item /dev/stdin
- The standard input.
-
- @item /dev/stdout
- The standard output.
-
- @item /dev/stderr
- The standard error output.
-
- @item /dev/fd/@var{n}
- The file denoted by the open file descriptor @var{n}.
- @end table
-
- @noindent
- These file names may also be used on the command line to name data files.
-
- @xref{Special Files}, for a longer description that provides the motivation
- for this feature.
-
- @node Numeric Functions Summary, String Functions Summary, Special File Summary, Actions Summary
- @appendixsubsubsec Numeric Functions
-
- @code{awk} has the following predefined arithmetic functions:
-
- @table @code
- @item atan2(@var{y}, @var{x})
- returns the arctangent of @var{y/x} in radians.
-
- @item cos(@var{expr})
- returns the cosine in radians.
-
- @item exp(@var{expr})
- the exponential function.
-
- @item int(@var{expr})
- truncates to integer.
-
- @item log(@var{expr})
- the natural logarithm function.
-
- @item rand()
- returns a random number between 0 and 1.
-
- @item sin(@var{expr})
- returns the sine in radians.
-
- @item sqrt(@var{expr})
- the square root function.
-
- @item srand(@var{expr})
- use @var{expr} as a new seed for the random number generator. If no @var{expr}
- is provided, the time of day is used. The return value is the previous
- seed for the random number generator.
- @end table
-
- @node String Functions Summary, String Constants Summary, Numeric Functions Summary, Actions Summary
- @appendixsubsubsec String Functions
-
- @code{awk} has the following predefined string functions:
-
- @table @code
- @item gsub(@var{r}, @var{s}, @var{t})
- for each substring matching the regular expression @var{r} in the string
- @var{t}, substitute the string @var{s}, and return the number of substitutions.
- If @var{t} is not supplied, use @code{$0}.
-
- @item index(@var{s}, @var{t})
- returns the index of the string @var{t} in the string @var{s}, or 0 if
- @var{t} is not present.
-
- @item length(@var{s})
- returns the length of the string @var{s}.
-
- @item match(@var{s}, @var{r})
- returns the position in @var{s} where the regular expression @var{r}
- occurs, or 0 if @var{r} is not present, and sets the values of @code{RSTART}
- and @code{RLENGTH}.
-
- @item split(@var{s}, @var{a}, @var{r})
- splits the string @var{s} into the array @var{a} on the regular expression
- @var{r}, and returns the number of fields. If @var{r} is omitted, @code{FS}
- is used instead.
-
- @item sprintf(@var{fmt}, @var{expr-list})
- prints @var{expr-list} according to @var{fmt}, and returns the resulting string.
-
- @item sub(@var{r}, @var{s}, @var{t})
- this is just like @code{gsub}, but only the first matching substring is
- replaced.
-
- @item substr(@var{s}, @var{i}, @var{n})
- returns the @var{n}-character substring of @var{s} starting at @var{i}.
- If @var{n} is omitted, the rest of @var{s} is used.
-
- @item tolower(@var{str})
- returns a copy of the string @var{str}, with all the upper-case characters in
- @var{str} translated to their corresponding lower-case counterparts.
- Nonalphabetic characters are left unchanged.
-
- @item toupper(@var{str})
- returns a copy of the string @var{str}, with all the lower-case characters in
- @var{str} translated to their corresponding upper-case counterparts.
- Nonalphabetic characters are left unchanged.
-
- @item system(@var{cmd-line})
- Execute the command @var{cmd-line}, and return the exit status.
- @end table
-
- @xref{Built-in}, for a description of all of @code{awk}'s built-in functions.
-
- @node String Constants Summary, , String Functions Summary, Actions Summary
- @appendixsubsubsec String Constants
-
- String constants in @code{awk} are sequences of characters enclosed
- between double quotes (@code{"}). Within strings, certain @dfn{escape sequences}
- are recognized, as in C. These are:
-
- @table @code
- @item \\
- A literal backslash.
-
- @item \a
- The ``alert'' character; usually the ASCII BEL character.
-
- @item \b
- Backspace.
-
- @item \f
- Formfeed.
-
- @item \n
- Newline.
-
- @item \r
- Carriage return.
-
- @item \t
- Horizontal tab.
-
- @item \v
- Vertical tab.
-
- @item \x@var{hex digits}
- The character represented by the string of hexadecimal digits following
- the @samp{\x}. As in ANSI C, all following hexadecimal digits are
- considered part of the escape sequence. (This feature should tell us
- something about language design by committee.) E.g., @code{"\x1B"} is a
- string containing the ASCII ESC (escape) character.
-
- @item \@var{ddd}
- The character represented by the 1-, 2-, or 3-digit sequence of octal
- digits. Thus, @code{"\033"} is also a string containing the ASCII ESC
- (escape) character.
-
- @item \@var{c}
- The literal character @var{c}.
- @end table
-
- The escape sequences may also be used inside constant regular expressions
- (e.g., the regexp @code{@w{/[@ \t\f\n\r\v]/}} matches whitespace
- characters).@refill
-
- @xref{Constants}.
-
- @node Functions Summary, , Rules Summary, Gawk Summary
- @appendixsec Functions
-
- Functions in @code{awk} are defined as follows:
-
- @example
- function @var{name}(@var{parameter list}) @{ @var{statements} @}
- @end example
-
- Actual parameters supplied in the function call are used to instantiate
- the formal parameters declared in the function. Arrays are passed by
- reference, other variables are passed by value.
-
- If there are fewer arguments passed than there are names in @var{parameter-list},
- the extra names are given the null string as value. Extra names have the
- effect of local variables.
-
- The open-parenthesis in a function call must immediately follow the
- function name, without any intervening white space. This is to avoid a
- syntactic ambiguity with the concatenation operator.
-
- The word @code{func} may be used in place of @code{function}.
-
- @xref{User-defined}, for a more complete description.
-
- @node Sample Program, Notes, Gawk Summary, Top
- @appendix Sample Program
-
- The following example is a complete @code{awk} program, which prints
- the number of occurrences of each word in its input. It illustrates the
- associative nature of @code{awk} arrays by using strings as subscripts. It
- also demonstrates the @samp{for @var{x} in @var{array}} construction.
- Finally, it shows how @code{awk} can be used in conjunction with other
- utility programs to do a useful task of some complexity with a minimum of
- effort. Some explanations follow the program listing.@refill
-
- @example
- awk '
- # Print list of word frequencies
- @{
- for (i = 1; i <= NF; i++)
- freq[$i]++
- @}
-
- END @{
- for (word in freq)
- printf "%s\t%d\n", word, freq[word]
- @}'
- @end example
-
- The first thing to notice about this program is that it has two rules. The
- first rule, because it has an empty pattern, is executed on every line of
- the input. It uses @code{awk}'s field-accessing mechanism (@pxref{Fields})
- to pick out the individual words from the line, and the built-in variable
- @code{NF} (@pxref{Built-in Variables}) to know how many fields are available.
-
- For each input word, an element of the array @code{freq} is incremented to
- reflect that the word has been seen an additional time.@refill
-
- The second rule, because it has the pattern @code{END}, is not executed
- until the input has been exhausted. It prints out the contents of the
- @code{freq} table that has been built up inside the first action.@refill
-
- Note that this program has several problems that would prevent it from being
- useful by itself on real text files:@refill
-
- @itemize @bullet
- @item
- Words are detected using the @code{awk} convention that fields are
- separated by whitespace and that other characters in the input (except
- newlines) don't have any special meaning to @code{awk}. This means that
- punctuation characters count as part of words.@refill
-
- @item
- The @code{awk} language considers upper and lower case characters to be
- distinct. Therefore, @samp{foo} and @samp{Foo} are not treated by this
- program as the same word. This is undesirable since in normal text, words
- are capitalized if they begin sentences, and a frequency analyzer should not
- be sensitive to that.@refill
-
- @item
- The output does not come out in any useful order. You're more likely to be
- interested in which words occur most frequently, or having an alphabetized
- table of how frequently each word occurs.@refill
- @end itemize
-
- The way to solve these problems is to use other system utilities to
- process the input and output of the @code{awk} script. Suppose the
- script shown above is saved in the file @file{frequency.awk}. Then the
- shell command:@refill
-
- @example
- tr A-Z a-z < file1 | tr -cd 'a-z\012' \
- | awk -f frequency.awk \
- | sort +1 -nr
- @end example
-
- @noindent
- produces a table of the words appearing in @file{file1} in order of
- decreasing frequency.
-
- The first @code{tr} command in this pipeline translates all the upper case
- characters in @file{file1} to lower case. The second @code{tr} command
- deletes all the characters in the input except lower case characters and
- newlines. The second argument to the second @code{tr} is quoted to protect
- the backslash in it from being interpreted by the shell. The @code{awk}
- program reads this suitably massaged data and produces a word frequency
- table, which is not ordered.
-
- The @code{awk} script's output is now sorted by the @code{sort} command and
- printed on the terminal. The options given to @code{sort} in this example
- specify to sort by the second field of each input line (skipping one field),
- that the sort keys should be treated as numeric quantities (otherwise
- @samp{15} would come before @samp{5}), and that the sorting should be done
- in descending (reverse) order.@refill
-
- See the general operating system documentation for more information on how
- to use the @code{tr} and @code{sort} commands.@refill
-
- @ignore
- @strong{ADR: I have some more substantial programs courtesy of Rick Adams
- at UUNET. I am planning on incorporating those either in addition to or
- instead of this program.}
-
- @strong{I would also like to incorporate the general @code{translate}
- function that I have written.}
- @end ignore
-
- @node Notes, Glossary, Sample Program, Top
- @appendix Implementation Notes
-
- This appendix contains information mainly of interest to implementors and
- maintainers of @code{gawk}. Everything in it applies specifically to
- @code{gawk}, and not to other implementations.
-
- @menu
- * Compatibility Mode:: How to disable certain @code{gawk} extensions.
-
- * Future Extensions:: New features we may implement soon.
-
- * Improvements:: Suggestions for improvements by volunteers.
- @end menu
-
- @node Compatibility Mode, Future Extensions, Notes, Notes
- @appendixsec Downwards Compatibility and Debugging
-
- @xref{S5R4/GNU}, for a summary of the GNU extensions to the @code{awk}
- language and program. All of these features can be turned off either by
- compiling @code{gawk} with @samp{-DSTRICT} (not recommended), or by
- invoking @code{gawk} with the @samp{-c} option.@refill
-
- If @code{gawk} is compiled for debugging with @samp{-DDEBUG}, then there
- are two more options available on the command line.
-
- @table @samp
- @item -d
- Print out debugging information during execution.
-
- @item -D
- Print out the parse stack information as the program is being parsed.
- @end table
-
- Both of these options are intended only for serious @code{gawk} developers,
- and not for the casual user. They probably have not even been compiled into
- your version of @code{gawk}, since they slow down execution.
-
- The code for recognizing special file names such as @file{/dev/stdin}
- can be disabled at compile time with @samp{-DNO_DEV_FD}, or with
- @samp{-DSTRICT}.@refill
-
- @node Future Extensions, Improvements, Compatibility Mode, Notes
- @appendixsec Probable Future Extensions
-
- This section briefly lists extensions that indicate the directions we are
- currently considering for @code{gawk}.
-
- @table @asis
- @item ANSI C compatible @code{printf}
- The @code{printf} and @code{sprintf} functions may be enhanced to be
- fully compatible with the specification for the @code{printf} family
- of functions in ANSI C.@refill
-
- @item @code{RS} as a regexp
- The meaning of @code{RS} may be generalized along the lines of @code{FS}.
-
- @item Control of subprocess environment
- Changes made in @code{gawk} to the array @code{ENVIRON} may be
- propagated to subprocesses run by @code{gawk}.
-
- @item Data bases
- It may be possible to map an NDBM/GDBM file into an @code{awk} array.
-
- @item Single-character fields
- The null string, @code{""}, as a field separator, will cause field
- splitting and the split function to separate individual characters.
- Thus, @code{split(a, "abcd", "")} would yield @code{a[1] == "a"},
- @code{a[2] == "b"}, and so on.
-
- @item Fixed-length fields and records
- A mechanism may be provided to allow the specification of fixed length
- fields and records.
-
- @item Regexp syntax
- The @code{egrep} syntax for regular expressions, now specified
- with the @samp{-e} option, may become the default, since the
- POSIX standard may specify this.
-
- @c this is @emph{very} long term --- not worth including right now.
- @ignore
- @item The C Comma Operator
- We may add the C comma operator, which takes the form
- @code{@var{expr1},@var{expr2}}. The first expression is evaluated, and the
- result is thrown away. The value of the full expression is the value of
- @var{expr2}.@refill
- @end ignore
- @end table
-
- @node Improvements,, Future Extensions, Notes
- @appendixsec Suggestions for Improvements
-
- Here are some projects that would-be @code{gawk} hackers might like to take
- on. They vary in size from a few days to a few weeks of programming,
- depending on which one you choose and how fast a programmer you are. Please
- send any improvements you write to the maintainers at the GNU
- project.@refill
-
- @enumerate
- @item
- State machine regexp matcher: At present, @code{gawk} uses the
- backtracking regular expression matcher from the GNU subroutine library.
- If a regexp is really going to be used a lot of times, it is faster to
- convert it once to a description of a finite state machine, then run a
- routine simulating that machine every time you want to match the regexp.
- You might be able to use the matching routines used by GNU @code{egrep}.
-
- @item
- Compilation of @code{awk} programs: @code{gawk} uses a Bison (YACC-like)
- parser to convert the script given it into a syntax tree; the syntax
- tree is then executed by a simple recursive evaluator. Both of these
- steps incur a lot of overhead, since parsing can be slow (especially if
- you also do the previous project and convert regular expressions to
- finite state machines at compile time) and the recursive evaluator
- performs many procedure calls to do even the simplest things.@refill
-
- It should be possible for @code{gawk} to convert the script's parse tree
- into a C program which the user would then compile, using the normal
- C compiler and a special @code{gawk} library to provide all the needed
- functions (regexps, fields, associative arrays, type coercion, and so
- on).@refill
-
- An easier possibility might be for an intermediate phase of @code{awk} to
- convert the parse tree into a linear byte code form like the one used
- in GNU Emacs Lisp. The recursive evaluator would then be replaced by
- a straight line byte code interpreter that would be intermediate in speed
- between running a compiled program and doing what @code{gawk} does
- now.@refill
-
- @item
- An error message section has not been included in this version of the
- manual. Perhaps some nice beta testers will document some of the messages
- for the future.
- @end enumerate
-
- @node Glossary, Index , Notes, Top
- @appendix Glossary
-
- @table @asis
- @item Action
- A series of @code{awk} statements attached to a rule. If the rule's
- pattern matches an input record, the @code{awk} language executes the
- rule's action. Actions are always enclosed in curly braces.
- @xref{Actions}.@refill
-
- @item Amazing @code{awk} Assembler
- Henry Spencer at the University of Toronto wrote a retargetable assembler
- completely as @code{awk} scripts. It is thousands of lines long, including
- machine descriptions for several 8-bit microcomputers. It is distributed
- with @code{gawk} and is a good example of a program that would have been
- better written in another language.@refill
-
- @item Assignment
- An @code{awk} expression that changes the value of some @code{awk}
- variable or data object. An object that you can assign to is called an
- @dfn{lvalue}. @xref{Assignment Ops}.@refill
-
- @item @code{awk} Language
- The language in which @code{awk} programs are written.
-
- @item @code{awk} Program
- An @code{awk} program consists of a series of @dfn{patterns} and
- @dfn{actions}, collectively known as @dfn{rules}. For each input record
- given to the program, the program's rules are all processed in turn.
- @code{awk} programs may also contain function definitions.@refill
-
- @item @code{awk} Script
- Another name for an @code{awk} program.
-
- @item Built-in Function
- The @code{awk} language provides built-in functions that perform various
- numerical and string computations. Examples are @code{sqrt} (for the
- square root of a number) and @code{substr} (for a substring of a
- string). @xref{Built-in}.@refill
-
- @item Built-in Variable
- The variables @code{ARGC}, @code{ARGV}, @code{ENVIRON}, @code{FILENAME},
- @code{FNR}, @code{FS}, @code{NF}, @code{IGNORECASE}, @code{NR}, @code{OFMT},
- @code{OFS}, @code{ORS}, @code{RLENGTH}, @code{RSTART}, @code{RS}, and
- @code{SUBSEP}, have special meaning to @code{awk}. Changing some of them
- affects @code{awk}'s running environment. @xref{Built-in Variables}.@refill
-
- @item C
- The system programming language that most GNU software is written in. The
- @code{awk} programming language has C-like syntax, and this manual
- points out similarities between @code{awk} and C when appropriate.@refill
-
- @item Compound Statement
- A series of @code{awk} statements, enclosed in curly braces. Compound
- statements may be nested. @xref{Statements}.@refill
-
- @item Concatenation
-