home *** CD-ROM | disk | FTP | other *** search
Text File | 1990-06-07 | 48.5 KB | 1,401 lines |
- often necessary to insure that it happens where you want it to by
- enclosing the items to be concatenated in parentheses. For example, the
- following code fragment does not concatenate @code{file} and @code{name}
- as you might expect:
-
- @example
- file = "file"
- name = "name"
- print "something meaningful" > file name
- @end example
-
- @noindent
- It is necessary to use the following:
-
- @example
- print "something meaningful" > (file name)
- @end example
-
- We recommend you use parentheses around concatenation in all but the
- most common contexts (such as in the right-hand operand of @samp{=}).
-
- @ignore
- @code{gawk} actually now allows a concatenation on the right hand
- side of a @code{>} redirection, but other @code{awk}s don't. So for
- now we won't mention that fact.
- @end ignore
-
- @node Comparison Ops, Boolean Ops, Concatenation, Expressions
- @section Comparison Expressions
- @cindex comparison expressions
- @cindex expressions, comparison
- @cindex relational operators
- @cindex operators, relational
- @cindex regexp operators
-
- @dfn{Comparison expressions} compare strings or numbers for
- relationships such as equality. They are written using @dfn{relational
- operators}, which are a superset of those in C. Here is a table of
- them:
-
- @table @code
- @item @var{x} < @var{y}
- True if @var{x} is less than @var{y}.
-
- @item @var{x} <= @var{y}
- True if @var{x} is less than or equal to @var{y}.
-
- @item @var{x} > @var{y}
- True if @var{x} is greater than @var{y}.
-
- @item @var{x} >= @var{y}
- True if @var{x} is greater than or equal to @var{y}.
-
- @item @var{x} == @var{y}
- True if @var{x} is equal to @var{y}.
-
- @item @var{x} != @var{y}
- True if @var{x} is not equal to @var{y}.
-
- @item @var{x} ~ @var{y}
- True if the string @var{x} matches the regexp denoted by @var{y}.
-
- @item @var{x} !~ @var{y}
- True if the string @var{x} does not match the regexp denoted by @var{y}.
-
- @item @var{subscript} in @var{array}
- True if array @var{array} has an element with the subscript @var{subscript}.
- @end table
-
- Comparison expressions have the value 1 if true and 0 if false.
-
- The operands of a relational operator are compared as numbers if they
- are both numbers. Otherwise they are converted to, and compared as,
- strings (@pxref{Conversion}). Strings are compared by comparing the
- first character of each, then the second character of each, and so on.
- Thus, @code{"10"} is less than @code{"9"}.
-
- For example,
-
- @example
- $1 == "foo"
- @end example
-
- @noindent
- has the value of 1, or is true, if the first field of the current input
- record is precisely @samp{foo}. By contrast,
-
- @example
- $1 ~ /foo/
- @end example
-
- @noindent
- has the value 1 if the first field contains @samp{foo}.
-
- The right hand operand of the @samp{~} and @samp{!~} operators may be
- either a constant regexp (@code{/@dots{}/}), or it may be an ordinary
- expression, in which case the value of the expression as a string is a
- dynamic regexp (@pxref{Regexp Usage}).
-
- @cindex regexp as expression
- In very recent implementations of @code{awk}, a constant regular
- expression in slashes by itself is also an expression. The regexp
- @code{/@var{regexp}/} is an abbreviation for this comparison expression:
-
- @example
- $0 ~ /@var{regexp}/
- @end example
-
- In some contexts it may be necessary to write parentheses around the
- regexp to avoid confusing the @code{gawk} parser. For example,
- @code{(/x/ - /y/) > threshold} is not allowed, but @code{((/x/) - (/y/))
- > threshold} parses properly.
-
- One special place where @code{/foo/} is @emph{not} an abbreviation for
- @code{$0 ~ /foo/} is when it is the right-hand operand of @samp{~} or
- @samp{!~}!
-
- @node Boolean Ops, Assignment Ops, Comparison Ops, Expressions
- @section Boolean Expressions
- @cindex expressions, boolean
- @cindex boolean expressions
- @cindex operators, boolean
- @cindex boolean operators
- @cindex logical operations
- @cindex and operator
- @cindex or operator
- @cindex not operator
-
- A @dfn{boolean expression} is combination of comparison expressions or
- matching expressions, using the @dfn{boolean operators} ``or''
- (@samp{||}), ``and'' (@samp{&&}), and ``not'' (@samp{!}), along with
- parentheses to control nesting. The truth of the boolean expression is
- computed by combining the truth values of the component expressions.
-
- Boolean expressions can be used wherever comparison and matching
- expressions can be used. They can be used in @code{if} and @code{while}
- statements. They have numeric values (1 if true, 0 if false), which
- come into place if the result of the boolean expression is stored in a
- variable, or used in arithmetic.
-
- In addition, every boolean expression is also a valid boolean pattern, so
- you can use it as a pattern to control the execution of rules.
-
- Here are descriptions of the three boolean operators, with an example of
- each. It may be instructive to compare these examples with the
- analogous examples of boolean patterns (@pxref{Boolean Patterns}), which
- use the same boolean operators in patterns instead of expressions.
-
- @table @code
- @item @var{boolean1} && @var{boolean2}
- True if both @var{boolean1} and @var{boolean2} are true. For example,
- the following statement prints the current input record if it contains
- both @samp{2400} and @samp{foo}.@refill
-
- @example
- if ($0 ~ /2400/ && $0 ~ /foo/) print
- @end example
-
- The subexpression @var{boolean2} is evaluated only if @var{boolean1}
- is true. This can make a difference when @var{boolean2} contains
- expressions that have side effects: in the case of @code{$0 ~ /foo/ &&
- ($2 == bar++)}, the variable @code{bar} is not incremented if there is
- no @samp{foo} in the record.
-
- @item @var{boolean1} || @var{boolean2}
- True if at least one of @var{boolean1} and @var{boolean2} is true.
- For example, the following command prints all records in the input
- file @file{BBS-list} that contain @emph{either} @samp{2400} or
- @samp{foo}, or both.@refill
-
- @example
- awk '@{ if ($0 ~ /2400/ || $0 ~ /foo/) print @}' BBS-list
- @end example
-
- The subexpression @var{boolean2} is evaluated only if @var{boolean1}
- is false. This can make a difference when @var{boolean2} contains
- expressions that have side effects.
-
- @item !@var{boolean}
- True if @var{boolean} is false. For example, the following program prints
- all records in the input file @file{BBS-list} that do @emph{not} contain the
- string @samp{foo}.
-
- @example
- awk '@{ if (! ($0 ~ /foo/)) print @}' BBS-list
- @end example
- @end table
-
- @node Assignment Ops, Increment Ops, Boolean Ops, Expressions
- @section Assignment Expressions
- @cindex assignment operators
- @cindex operators, assignment
- @cindex expressions, assignment
-
- An @dfn{assignment} is an expression that stores a new value into a
- variable. For example, let's assign the value 1 to the variable
- @code{z}:@refill
-
- @example
- z = 1
- @end example
-
- After this expression is executed, the variable @code{z} has the value 1.
- Whatever old value @code{z} had before the assignment is forgotten.
-
- Assignments can store string values also. For example, this would store
- the value @code{"this food is good"} in the variable @code{message}:
-
- @example
- thing = "food"
- predicate = "good"
- message = "this " thing " is " predicate
- @end example
-
- @noindent
- (This also illustrates concatenation of strings.)
-
- The @samp{=} sign is called an @dfn{assignment operator}. It is the
- simplest assignment operator because the value of the right-hand
- operand is stored unchanged.
-
- @cindex side effect
- Most operators (addition, concatenation, and so on) have no effect
- except to compute a value. If you ignore the value, you might as well
- not use the operator. An assignment operator is different; it does
- produce a value, but even if you ignore the value, the assignment still
- makes itself felt through the alteration of the variable. We call this
- a @dfn{side effect}.
-
- @cindex lvalue
- The left-hand operand of an assignment need not be a variable
- (@pxref{Variables}); it can also be a field (@pxref{Changing Fields}) or
- an array element (@pxref{Arrays}). These are all called @dfn{lvalues},
- which means they can appear on the left-hand side of an assignment operator.
- The right-hand operand may be any expression; it produces the new value
- which the assignment stores in the specified variable, field or array
- element.
-
- It is important to note that variables do @emph{not} have permanent types.
- The type of a variable is simply the type of whatever value it happens
- to hold at the moment. In the following program fragment, the variable
- @code{foo} has a numeric value at first, and a string value later on:
-
- @example
- foo = 1
- print foo
- foo = "bar"
- print foo
- @end example
-
- @noindent
- When the second assignment gives @code{foo} a string value, the fact that
- it previously had a numeric value is forgotten.
-
- An assignment is an expression, so it has a value: the same value that
- is assigned. Thus, @code{z = 1} as an expression has the value 1.
- One consequence of this is that you can write multiple assignments together:
-
- @example
- x = y = z = 0
- @end example
-
- @noindent
- stores the value 0 in all three variables. It does this because the
- value of @code{z = 0}, which is 0, is stored into @code{y}, and then
- the value of @code{y = z = 0}, which is 0, is stored into @code{x}.
-
- You can use an assignment anywhere an expression is called for. For
- example, it is valid to write @code{x != (y = 1)} to set @code{y} to 1
- and then test whether @code{x} equals 1. But this style tends to make
- programs hard to read; except in a one-shot program, you should
- rewrite it to get rid of such nesting of assignments. This is never very
- hard.
-
- Aside from @samp{=}, there are several other assignment operators that
- do arithmetic with the old value of the variable. For example, the
- operator @samp{+=} computes a new value by adding the right-hand value
- to the old value of the variable. Thus, the following assignment adds
- 5 to the value of @code{foo}:
-
- @example
- foo += 5
- @end example
-
- @noindent
- This is precisely equivalent to the following:
-
- @example
- foo = foo + 5
- @end example
-
- @noindent
- Use whichever one makes the meaning of your program clearer.
-
- Here is a table of the arithmetic assignment operators. In each
- case, the right-hand operand is an expression whose value is converted
- to a number.
-
- @table @code
- @item @var{lvalue} += @var{increment}
- Adds @var{increment} to the value of @var{lvalue} to make the new value
- of @var{lvalue}.
-
- @item @var{lvalue} -= @var{decrement}
- Subtracts @var{decrement} from the value of @var{lvalue}.
-
- @item @var{lvalue} *= @var{coefficient}
- Multiplies the value of @var{lvalue} by @var{coefficient}.
-
- @item @var{lvalue} /= @var{quotient}
- Divides the value of @var{lvalue} by @var{quotient}.
-
- @item @var{lvalue} %= @var{modulus}
- Sets @var{lvalue} to its remainder by @var{modulus}.
-
- @item @var{lvalue} ^= @var{power}
- @itemx @var{lvalue} **= @var{power}
- Raises @var{lvalue} to the power @var{power}.
- @end table
-
- @node Increment Ops, Conversion, Assignment Ops, Expressions
- @section Increment Operators
-
- @cindex increment operators
- @cindex operators, increment
- @dfn{Increment operators} increase or decrease the value of a variable
- by 1. You could do the same thing with an assignment operator, so
- the increment operators add no power to the @code{awk} language; but they
- are convenient abbreviations for something very common.
-
- The operator to add 1 is written @samp{++}. It can be used to increment
- a variable either before or after taking its value.
-
- To pre-increment a variable @var{v}, write @code{++@var{v}}. This adds
- 1 to the value of @var{v} and that new value is also the value of this
- expression. The assignment expression @code{@var{v} += 1} is completely
- equivalent.
-
- Writing the @samp{++} after the variable specifies post-increment. This
- increments the variable value just the same; the difference is that the
- value of the increment expression itself is the variable's @emph{old}
- value. Thus, if @code{foo} has value 4, then the expression @code{foo++}
- has the value 4, but it changes the value of @code{foo} to 5.
-
- The post-increment @code{foo++} is nearly equivalent to writing @code{(foo
- += 1) - 1}. It is not perfectly equivalent because all numbers in
- @code{awk} are floating point: in floating point, @code{foo + 1 - 1} does
- not necessarily equal @code{foo}. But the difference is minute as
- long as you stick to numbers that are fairly small (less than a trillion).
-
- Any lvalue can be incremented. Fields and array elements are incremented
- just like variables.
-
- The decrement operator @samp{--} works just like @samp{++} except that
- it subtracts 1 instead of adding. Like @samp{++}, it can be used before
- the lvalue to pre-decrement or after it to post-decrement.
-
- Here is a summary of increment and decrement expressions.
-
- @table @code
- @item ++@var{lvalue}
- This expression increments @var{lvalue} and the new value becomes the
- value of this expression.
-
- @item @var{lvalue}++
- This expression causes the contents of @var{lvalue} to be incremented.
- The value of the expression is the @emph{old} value of @var{lvalue}.
-
- @item --@var{lvalue}
- Like @code{++@var{lvalue}}, but instead of adding, it subtracts. It
- decrements @var{lvalue} and delivers the value that results.
-
- @item @var{lvalue}--
- Like @code{@var{lvalue}++}, but instead of adding, it subtracts. It
- decrements @var{lvalue}. The value of the expression is the @emph{old}
- value of @var{lvalue}.
- @end table
-
- @node Conversion, Conditional Exp, Increment Ops, Expressions
- @section Conversion of Strings and Numbers
-
- @cindex conversion of strings and numbers
- Strings are converted to numbers, and numbers to strings, if the context
- of the @code{awk} program demands it. For example, if the value of
- either @code{foo} or @code{bar} in the expression @code{foo + bar}
- happens to be a string, it is converted to a number before the addition
- is performed. If numeric values appear in string concatenation, they
- are converted to strings. Consider this:@refill
-
- @example
- two = 2; three = 3
- print (two three) + 4
- @end example
-
- @noindent
- This eventually prints the (numeric) value 27. The numeric values of
- the variables @code{two} and @code{three} are converted to strings and
- concatenated together, and the resulting string is converted back to the
- number 23, to which 4 is then added.
-
- If, for some reason, you need to force a number to be converted to a
- string, concatenate the null string with that number. To force a string
- to be converted to a number, add zero to that string.
-
- Strings are converted to numbers by interpreting them as numerals:
- @code{"2.5"} converts to 2.5, and @code{"1e3"} converts to 1000.
- Strings that can't be interpreted as valid numbers are converted to
- zero.
-
- @vindex OFMT
- The exact manner in which numbers are converted into strings is controlled
- by the @code{awk} built-in variable @code{OFMT} (@pxref{Built-in Variables}).
- Numbers are converted using a special
- version of the @code{sprintf} function (@pxref{Built-in}) with @code{OFMT}
- as the format specifier.@refill
-
- @code{OFMT}'s default value is @code{"%.6g"}, which prints a value with
- at least six significant digits. For some applications you will want to
- change it to specify more precision. Double precision on most modern
- machines gives you 16 or 17 decimal digits of precision.
-
- Strange results can happen if you set @code{OFMT} to a string that doesn't
- tell @code{sprintf} how to format floating point numbers in a useful way.
- For example, if you forget the @samp{%} in the format, all numbers will be
- converted to the same constant string.@refill
-
- @node Conditional Exp, Function Calls, Conversion, Expressions
- @section Conditional Expressions
- @cindex conditional expression
- @cindex expression, conditional
-
- A @dfn{conditional expression} is a special kind of expression with
- three operands. It allows you to use one expression's value to select
- one of two other expressions.
-
- The conditional expression looks the same as in the C language:
-
- @example
- @var{selector} ? @var{if-true-exp} : @var{if-false-exp}
- @end example
-
- @noindent
- There are three subexpressions. The first, @var{selector}, is always
- computed first. If it is ``true'' (not zero) then @var{if-true-exp} is
- computed next and its value becomes the value of the whole expression.
- Otherwise, @var{if-false-exp} is computed next and its value becomes the
- value of the whole expression.
-
- For example, this expression produces the absolute value of @code{x}:
-
- @example
- x > 0 ? x : -x
- @end example
-
- Each time the conditional expression is computed, exactly one of
- @var{if-true-exp} and @var{if-false-exp} is computed; the other is ignored.
- This is important when the expressions contain side effects. For example,
- this conditional expression examines element @code{i} of either array
- @code{a} or array @code{b}, and increments @code{i}.
-
- @example
- x == y ? a[i++] : b[i++]
- @end example
-
- @noindent
- This is guaranteed to increment @code{i} exactly once, because each time
- one or the other of the two increment expressions is executed,
- and the other is not.
-
- @node Function Calls, Precedence, Conditional Exp, Expressions
- @section Function Calls
- @cindex function call
- @cindex calling a function
-
- A @dfn{function} is a name for a particular calculation. Because it has
- a name, you can ask for it by name at any point in the program. For
- example, the function @code{sqrt} computes the square root of a number.
-
- A fixed set of functions are @dfn{built in}, which means they are
- available in every @code{awk} program. The @code{sqrt} function is one
- of these. @xref{Built-in}, for a list of built-in functions and their
- descriptions. In addition, you can define your own functions in the
- program for use elsewhere in the same program. @xref{User-defined},
- for how to do this.
-
- @cindex arguments in function call
- The way to use a function is with a @dfn{function call} expression,
- which consists of the function name followed by a list of
- @dfn{arguments} in parentheses. The arguments are expressions which
- give the raw materials for the calculation that the function will do.
- When there is more than one argument, they are separated by commas. If
- there are no arguments, write just @samp{()} after the function name.
- Here are some examples:
-
- @example
- sqrt(x**2 + y**2) # @r{One argument}
- atan2(y, x) # @r{Two arguments}
- rand() # @r{No arguments}
- @end example
-
- @strong{Do not put any space between the function name and the
- open-parenthesis!} A user-defined function name looks just like the name of
- a variable, and space would make the expression look like concatenation
- of a variable with an expression inside parentheses. Space before the
- parenthesis is harmless with built-in functions, but it is best not to get
- into the habit of using space, lest you do likewise for a user-defined
- function one day by mistake.
-
- Each function expects a particular number of arguments. For example, the
- @code{sqrt} function must be called with a single argument, the number
- to take the square root of:
-
- @example
- sqrt(@var{argument})
- @end example
-
- Some of the built-in functions allow you to omit the final argument.
- If you do so, they use a reasonable default. @xref{Built-in},
- for full details. If arguments are omitted in calls to user-defined
- functions, then those arguments are treated as local variables,
- initialized to the null string (@pxref{User-defined}).
-
- Like every other expression, the function call has a value, which is
- computed by the function based on the arguments you give it. In this
- example, the value of @code{sqrt(@var{argument})} is the square root of the
- argument. A function can also have side effects, such as assigning the
- values of certain variables or doing I/O.
-
- Here is a command to read numbers, one number per line, and print the
- square root of each one:
-
- @example
- awk '@{ print "The square root of", $1, "is", sqrt($1) @}'
- @end example
-
- @node Precedence,, Function Calls, Expressions
- @section Operator Precedence: How Operators Nest
- @cindex precedence
- @cindex operator precedence
-
- @dfn{Operator precedence} determines how operators are grouped, when
- different operators appear close by in one expression. For example,
- @samp{*} has higher precedence than @samp{+}; thus, @code{a + b * c}
- means to multiply @code{b} and @code{c}, and then add @code{a} to the
- product.
-
- You can overrule the precedence of the operators by writing parentheses
- yourself. You can think of the precedence rules as saying where the
- parentheses are assumed if you do not write parentheses yourself. In
- fact, it is wise always to use parentheses whenever you have an unusual
- combination of operators, because other people who read the program may
- not remember what the precedence is in this case. You might forget,
- too; then you could make a mistake. Explicit parentheses will prevent
- any such mistake.
-
- When operators of equal precedence are used together, the leftmost
- operator groups first, except for the assignment, conditional and
- and exponentiation operators, which group in the opposite order.
- Thus, @code{a - b + c} groups as @code{(a - b) + c};
- @code{a = b = c} groups as @code{a = (b = c)}.
-
- The precedence of prefix unary operators does not matter as long as only
- unary operators are involved, because there is only one way to parse
- them---innermost first. Thus, @code{$++i} means @code{$(++i)} and
- @code{++$x} means @code{++($x)}. However, when another operator follows
- the operand, then the precedence of the unary operators can matter.
- Thus, @code{$x**2} means @code{($x)**2}, but @code{-x**2} means
- @code{-(x**2)}, because @samp{-} has lower precedence than @samp{**}
- while @samp{$} has higher precedence.
-
- Here is a table of the operators of @code{awk}, in order of increasing
- precedence:
-
- @table @asis
- @item assignment
- @samp{=}, @samp{+=}, @samp{-=}, @samp{*=}, @samp{/=}, @samp{%=},
- @samp{^=}, @samp{**=}. These operators group right-to-left.
-
- @item conditional
- @samp{?:}. These operators group right-to-left.
-
- @item logical ``or''.
- @samp{||}.
-
- @item logical ``and''.
- @samp{&&}.
-
- @item array membership
- @code{in}.
-
- @item matching
- @samp{~}, @samp{!~}.
-
- @item relational, and redirection
- The relational operators and the redirections have the same precedence
- level. Characters such as @samp{>} serve both as relationals and as
- redirections; the context distinguishes between the two meanings.
-
- The relational operators are @samp{<}, @samp{<=}, @samp{==}, @samp{!=},
- @samp{>=} and @samp{>}.
-
- The I/O redirection operators are @samp{<}, @samp{>}, @samp{>>} and
- @samp{|}.
-
- Note that I/O redirection operators in @code{print} and @code{printf}
- statements belong to the statement level, not to expressions. The
- redirection does not produce an expression which could be the operand of
- another operator. As a result, it does not make sense to use a
- redirection operator near another operator of lower precedence, without
- parentheses. Such combinations, for example @samp{print foo > a ? b :
- c}, result in syntax errors.
-
- @item concatentation
- No special token is used to indicate concatenation.
- The operands are simply written side by side.
- @c This is supposedly being fixed
- @ignore
- Concatenation has the same precedence as relational and redirection
- operators. These operators nest left to right. Thus, @code{4 5 > 6}
- concatenates first, yielding 1, while @code{6 < 4 5} compares first, and
- yields @code{"05"}.
- @end ignore
-
- @item add, subtract
- @samp{+}, @samp{-}.
-
- @item multiply, divide, mod
- @samp{*}, @samp{/}, @samp{%}.
-
- @item unary plus, minus, ``not''
- @samp{+}, @samp{-}, @samp{!}.
-
- @item exponentiation
- @samp{^}, @samp{**}. These operators group right-to-left.
-
- @item increment, decrement
- @samp{++}, @samp{--}.
-
- @item field
- @samp{$}.
- @end table
-
- @node Statements, Arrays, Expressions, Top
- @chapter Actions: Control Statements
- @cindex control statement
-
- @dfn{Control statements} such as @code{if}, @code{while}, and so on
- control the flow of execution in @code{awk} programs. Most of the
- control statements in @code{awk} are patterned on similar statements in
- C.
-
- All the control statements start with special keywords such as @code{if}
- and @code{while}, to distinguish them from simple expressions.
-
- Many control statements contain other statements; for example, the
- @code{if} statement contains another statement which may or may not be
- executed. The contained statement is called the @dfn{body}. If you
- want to include more than one statement in the body, group them into a
- single compound statement with curly braces, separating them with
- newlines or semicolons.
-
- @menu
- * If Statement:: Conditionally execute some @code{awk} statements.
-
- * While Statement:: Loop until some condition is satisfied.
-
- * Do Statement:: Do specified action while looping until some
- condition is satisfied.
-
- * For Statement:: Another looping statement, that provides
- initialization and increment clauses.
-
- * Break Statement:: Immediately exit the innermost enclosing loop.
-
- * Continue Statement:: Skip to the end of the innermost enclosing loop.
-
- * Next Statement:: Stop processing the current input record.
-
- * Exit Statement:: Stop execution of @code{awk}.
- @end menu
-
- @node If Statement, While Statement, Statements, Statements
- @section The @code{if} Statement
-
- @cindex @code{if} statement
- The @code{if}-@code{else} statement is @code{awk}'s decision-making
- statement. It looks like this:@refill
-
- @example
- if (@var{condition}) @var{then-body} @r{[}else @var{else-body}@r{]}
- @end example
-
- @noindent
- Here @var{condition} is an expression that controls what the rest of the
- statement will do. If @var{condition} is true, @var{then-body} is
- executed; otherwise, @var{else-body} is executed (assuming that the
- @code{else} clause is present). The @code{else} part of the statement is
- optional. The condition is considered false if its value is zero or
- the null string, true otherwise.@refill
-
- Here is an example:
-
- @example
- if (x % 2 == 0)
- print "x is even"
- else
- print "x is odd"
- @end example
-
- In this example, if the expression @code{x % 2 == 0} is true (that is,
- the value of @code{x} is divisible by 2), then the first @code{print}
- statement is executed, otherwise the second @code{print} statement is
- performed.@refill
-
- If the @code{else} appears on the same line as @var{then-body}, and
- @var{then-body} is not a compound statement (i.e., not surrounded by
- curly braces), then a semicolon must separate @var{then-body} from
- @code{else}. To illustrate this, let's rewrite the previous example:
-
- @group
- @example
- awk '@{ if (x % 2 == 0) print "x is even"; else
- print "x is odd" @}'
- @end example
- @end group
-
- @noindent
- If you forget the @samp{;}, @code{awk} won't be able to parse the
- statement, and you will get a syntax error.
-
- We would not actually write this example this way, because a human
- reader might fail to see the @code{else} if it were not the first thing
- on its line.
-
- @node While Statement, Do Statement, If Statement, Statements
- @section The @code{while} Statement
- @cindex @code{while} statement
- @cindex loop
- @cindex body of a loop
-
- In programming, a @dfn{loop} means a part of a program that is (or at least can
- be) executed two or more times in succession.
-
- The @code{while} statement is the simplest looping statement in
- @code{awk}. It repeatedly executes a statement as long as a condition is
- true. It looks like this:
-
- @example
- while (@var{condition})
- @var{body}
- @end example
-
- @noindent
- Here @var{body} is a statement that we call the @dfn{body} of the loop,
- and @var{condition} is an expression that controls how long the loop
- keeps running.
-
- The first thing the @code{while} statement does is test @var{condition}.
- If @var{condition} is true, it executes the statement @var{body}.
- (Truth, as usual in @code{awk}, means that the value of @var{condition}
- is not zero and not a null string.) After @var{body} has been executed,
- @var{condition} is tested again, and if it is still true, @var{body} is
- executed again. This process repeats until @var{condition} is no longer
- true. If @var{condition} is initially false, the body of the loop is
- never executed.@refill
-
- This example prints the first three fields of each record, one per line.
-
- @example
- awk '@{ i = 1
- while (i <= 3) @{
- print $i
- i++
- @}
- @}'
- @end example
-
- @noindent
- Here the body of the loop is a compound statement enclosed in braces,
- containing two statements.
-
- The loop works like this: first, the value of @code{i} is set to 1.
- Then, the @code{while} tests whether @code{i} is less than or equal to
- three. This is the case when @code{i} equals one, so the @code{i}-th
- field is printed. Then the @code{i++} increments the value of @code{i}
- and the loop repeats. The loop terminates when @code{i} reaches 4.
-
- As you can see, a newline is not required between the condition and the
- body; but using one makes the program clearer unless the body is a
- compound statement or is very simple. The newline after the open-brace
- that begins the compound statement is not required either, but the
- program would be hard to read without it.
-
- @node Do Statement, For Statement, While Statement, Statements
- @section The @code{do}-@code{while} Statement
-
- The @code{do} loop is a variation of the @code{while} looping statement.
- The @code{do} loop executes the @var{body} once, then repeats @var{body}
- as long as @var{condition} is true. It looks like this:
-
- @group
- @example
- do
- @var{body}
- while (@var{condition})
- @end example
- @end group
-
- Even if @var{condition} is false at the start, @var{body} is executed at
- least once (and only once, unless executing @var{body} makes
- @var{condition} true). Contrast this with the corresponding
- @code{while} statement:
-
- @example
- while (@var{condition})
- @var{body}
- @end example
-
- @noindent
- This statement does not execute @var{body} even once if @var{condition}
- is false to begin with.
-
- Here is an example of a @code{do} statement:
-
- @example
- awk '@{ i = 1
- do @{
- print $0
- i++
- @} while (i <= 10)
- @}'
- @end example
-
- @noindent
- prints each input record ten times. It isn't a very realistic example,
- since in this case an ordinary @code{while} would do just as well. But
- this reflects actual experience; there is only occasionally a real use
- for a @code{do} statement.@refill
-
- @node For Statement, Break Statement, Do Statement, Statements
- @section The @code{for} Statement
- @cindex @code{for} statement
-
- The @code{for} statement makes it more convenient to count iterations of a
- loop. The general form of the @code{for} statement looks like this:@refill
-
- @example
- for (@var{initialization}; @var{condition}; @var{increment})
- @var{body}
- @end example
-
- @noindent
- This statement starts by executing @var{initialization}. Then, as long
- as @var{condition} is true, it repeatedly executes @var{body} and then
- @var{increment}. Typically @var{initialization} sets a variable to
- either zero or one, @var{increment} adds 1 to it, and @var{condition}
- compares it against the desired number of iterations.
-
- Here is an example of a @code{for} statement:
-
- @example
- awk '@{ for (i = 1; i <= 3; i++)
- print $i
- @}'
- @end example
-
- @noindent
- This prints the first three fields of each input record, one field per
- line.
-
- In the @code{for} statement, @var{body} stands for any statement, but
- @var{initialization}, @var{condition} and @var{increment} are just
- expressions. You cannot set more than one variable in the
- @var{initialization} part unless you use a multiple assignment statement
- such as @code{x = y = 0}, which is possible only if all the initial values
- are equal. (But you can initialize additional variables by writing
- their assignments as separate statements preceding the @code{for} loop.)
-
- The same is true of the @var{increment} part; to increment additional
- variables, you must write separate statements at the end of the loop.
- The C compound expression, using C's comma operator, would be useful in
- this context, but it is not supported in @code{awk}.
-
- Most often, @var{increment} is an increment expression, as in the
- example above. But this is not required; it can be any expression
- whatever. For example, this statement prints all the powers of 2
- between 1 and 100:
-
- @example
- for (i = 1; i <= 100; i *= 2)
- print i
- @end example
-
- Any of the three expressions in the parentheses following @code{for} may
- be omitted if there is nothing to be done there. Thus, @w{@samp{for (;x
- > 0;)}} is equivalent to @w{@samp{while (x > 0)}}. If the
- @var{condition} is omitted, it is treated as @var{true}, effectively
- yielding an infinite loop.@refill
-
- In most cases, a @code{for} loop is an abbreviation for a @code{while}
- loop, as shown here:
-
- @example
- @var{initialization}
- while (@var{condition}) @{
- @var{body}
- @var{increment}
- @}
- @end example
-
- @noindent
- The only exception is when the @code{continue} statement
- (@pxref{Continue Statement}) is used inside the loop; changing a
- @code{for} statement to a @code{while} statement in this way can change
- the effect of the @code{continue} statement inside the loop.
-
- There is an alternate version of the @code{for} loop, for iterating over
- all the indices of an array:
-
- @example
- for (i in array)
- @var{do something with} array[i]
- @end example
-
- @noindent
- @xref{Arrays}, for more information on this version of the @code{for} loop.
-
- The @code{awk} language has a @code{for} statement in addition to a
- @code{while} statement because often a @code{for} loop is both less work to
- type and more natural to think of. Counting the number of iterations is
- very common in loops. It can be easier to think of this counting as part
- of looping rather than as something to do inside the loop.
-
- The next section has more complicated examples of @code{for} loops.
-
- @node Break Statement, Continue Statement, For Statement, Statements
- @section The @code{break} Statement
- @cindex @code{break} statement
- @cindex loops, exiting
-
- The @code{break} statement jumps out of the innermost @code{for},
- @code{while}, or @code{do}-@code{while} loop that encloses it. The
- following example finds the smallest divisor of any integer, and also
- identifies prime numbers:@refill
-
- @example
- awk '# find smallest divisor of num
- @{ num = $1
- for (div = 2; div*div <= num; div++)
- if (num % div == 0)
- break
- if (num % div == 0)
- printf "Smallest divisor of %d is %d\n", num, div
- else
- printf "%d is prime\n", num @}'
- @end example
-
- When the remainder is zero in the first @code{if} statement, @code{awk}
- immediately @dfn{breaks out} of the containing @code{for} loop. This means
- that @code{awk} proceeds immediately to the statement following the loop
- and continues processing. (This is very different from the @code{exit}
- statement (@pxref{Exit Statement}) which stops the entire @code{awk}
- program.)@refill
-
- Here is another program equivalent to the previous one. It illustrates how
- the @var{condition} of a @code{for} or @code{while} could just as well be
- replaced with a @code{break} inside an @code{if}:
-
- @example
- awk '# find smallest divisor of num
- @{ num = $1
- for (div = 2; ; div++) @{
- if (num % div == 0) @{
- printf "Smallest divisor of %d is %d\n", num, div
- break
- @}
- if (div*div > num) @{
- printf "%d is prime\n", num
- break
- @}
- @}
- @}'
- @end example
-
- @node Continue Statement, Next Statement, Break Statement, Statements
- @section The @code{continue} Statement
-
- @cindex @code{continue} statement
- The @code{continue} statement, like @code{break}, is used only inside
- @code{for}, @code{while}, and @code{do}-@code{while} loops. It skips
- over the rest of the loop body, causing the next cycle around the loop
- to begin immediately. Contrast this with @code{break}, which jumps out
- of the loop altogether. Here is an example:@refill
-
- @example
- # print names that don't contain the string "ignore"
-
- # first, save the text of each line
- @{ names[NR] = $0 @}
-
- # print what we're interested in
- END @{
- for (x in names) @{
- if (names[x] ~ /ignore/)
- continue
- print names[x]
- @}
- @}
- @end example
-
- If one of the input records contains the string @samp{ignore}, this
- example skips the print statement for that record, and continues back to
- the first statement in the loop.
-
- This isn't a practical example of @code{continue}, since it would be
- just as easy to write the loop like this:
-
- @example
- for (x in names)
- if (names[x] !~ /ignore/)
- print names[x]
- @end example
-
- The @code{continue} statement in a @code{for} loop directs @code{awk} to
- skip the rest of the body of the loop, and resume execution with the
- increment-expression of the @code{for} statement. The following program
- illustrates this fact:@refill
-
- @example
- awk 'BEGIN @{
- for (x = 0; x <= 20; x++) @{
- if (x == 5)
- continue
- printf ("%d ", x)
- @}
- print ""
- @}'
- @end example
-
- @noindent
- This program prints all the numbers from 0 to 20, except for 5, for
- which the @code{printf} is skipped. Since the increment @code{x++}
- is not skipped, @code{x} does not remain stuck at 5. Contrast the
- @code{for} loop above with the @code{while} loop:
-
- @example
- awk 'BEGIN @{
- x = 0
- while (x <= 20) @{
- if (x == 5)
- continue
- printf ("%d ", x)
- x++
- @}
- print ""
- @}'
- @end example
-
- @noindent
- This program loops forever once @code{x} gets to 5.
-
- @node Next Statement, Exit Statement, Continue Statement, Statements
- @section The @code{next} Statement
- @cindex @code{next} statement
-
- The @code{next} statement forces @code{awk} to immediately stop processing
- the current record and go on to the next record. This means that no
- further rules are executed for the current record. The rest of the
- current rule's action is not executed either.
-
- Contrast this with the effect of the @code{getline} function
- (@pxref{Getline}). That too causes @code{awk} to read the next record
- immediately, but it does not alter the flow of control in any way. So
- the rest of the current action executes with a new input record.
-
- At the grossest level, @code{awk} program execution is a loop that reads
- an input record and then tests each rule's pattern against it. If you
- think of this loop as a @code{for} statement whose body contains the
- rules, then the @code{next} statement is analogous to a @code{continue}
- statement: it skips to the end of the body of this implicit loop, and
- executes the increment (which reads another record).
-
- For example, if your @code{awk} program works only on records with four
- fields, and you don't want it to fail when given bad input, you might
- use this rule near the beginning of the program:
-
- @example
- NF != 4 @{
- printf("line %d skipped: doesn't have 4 fields", FNR) > "/dev/stderr"
- next
- @}
- @end example
-
- @noindent
- so that the following rules will not see the bad record. The error
- message is redirected to the standard error output stream, as error
- messages should be. @xref{Special Files}.
-
- The @code{next} statement is not allowed in a @code{BEGIN} or @code{END}
- rule.
-
- @node Exit Statement, , Next Statement, Statements
- @section The @code{exit} Statement
-
- @cindex @code{exit} statement
- The @code{exit} statement causes @code{awk} to immediately stop
- executing the current rule and to stop processing input; any remaining input
- is ignored.@refill
-
- If an @code{exit} statement is executed from a @code{BEGIN} rule the
- program stops processing everything immediately. No input records are
- read. However, if an @code{END} rule is present, it is executed
- (@pxref{BEGIN/END}).
-
- If @code{exit} is used as part of an @code{END} rule, it causes
- the program to stop immediately.
-
- An @code{exit} statement that is part an ordinary rule (that is, not part
- of a @code{BEGIN} or @code{END} rule) stops the execution of any further
- automatic rules, but the @code{END} rule is executed if there is one.
- If you don't want the @code{END} rule to do its job in this case, you
- can set a variable to nonzero before the @code{exit} statement, and check
- that variable in the @code{END} rule.
-
- If an argument is supplied to @code{exit}, its value is used as the exit
- status code for the @code{awk} process. If no argument is supplied,
- @code{exit} returns status zero (success).@refill
-
- For example, let's say you've discovered an error condition you really
- don't know how to handle. Conventionally, programs report this by
- exiting with a nonzero status. Your @code{awk} program can do this
- using an @code{exit} statement with a nonzero argument. Here's an
- example of this:@refill
-
- @example
- BEGIN @{
- if (("date" | getline date_now) < 0) @{
- print "Can't get system date" > "/dev/stderr"
- exit 4
- @}
- @}
- @end example
-
- @node Arrays, Built-in, Statements, Top
- @chapter Arrays in @code{awk}
-
- An @dfn{array} is a table of various values, called @dfn{elements}. The
- elements of an array are distinguished by their @dfn{indices}. Indices
- may be either numbers or strings. Each array has a name, which looks
- like a variable name, but must not be in use as a variable name in the
- same @code{awk} program.
-
- @menu
- * Intro: Array Intro. Basic facts about arrays in @code{awk}.
- * Reference to Elements:: How to examine one element of an array.
- * Assigning Elements:: How to change an element of an array.
- * Example: Array Example. Sample program explained.
-
- * Scanning an Array:: A variation of the @code{for} statement. It loops
- through the indices of an array's existing elements.
-
- * Delete:: The @code{delete} statement removes an element from an array.
-
- * Multi-dimensional:: Emulating multi-dimensional arrays in @code{awk}.
- * Multi-scanning:: Scanning multi-dimensional arrays.
- @end menu
-
- @node Array Intro, Reference to Elements, Arrays, Arrays
- @section Introduction to Arrays
-
- @cindex arrays
- The @code{awk} language has one-dimensional @dfn{arrays} for storing groups
- of related strings or numbers.
-
- Every @code{awk} array must have a name. Array names have the same
- syntax as variable names; any valid variable name would also be a valid
- array name. But you cannot use one name in both ways (as an array and
- as a variable) in one @code{awk} program.
-
- Arrays in @code{awk} superficially resemble arrays in other programming
- languages; but there are fundamental differences. In @code{awk}, you
- don't need to specify the size of an array before you start to use it.
- What's more, in @code{awk} any number or even a string may be used as an
- array index.
-
- In most other languages, you have to @dfn{declare} an array and specify
- how many elements or components it has. In such languages, the
- declaration causes a contiguous block of memory to be allocated for that
- many elements. An index in the array must be a positive integer; for
- example, the index 0 specifies the first element in the array, which is
- actually stored at the beginning of the block of memory. Index 1
- specifies the second element, which is stored in memory right after the
- first element, and so on. It is impossible to add more elements to the
- array, because it has room for only as many elements as you declared.
-
- A contiguous array of four elements might look like this, conceptually,
- if the element values are 8, @code{"foo"}, @code{""} and 30:@refill
-
- @example
- +---------+---------+--------+---------+
- | 8 | "foo" | "" | 30 | @r{value}
- +---------+---------+--------+---------+
- 0 1 2 3 @r{index}
- @end example
-
- @noindent
- Only the values are stored; the indices are implicit from the order of
- the values. 8 is the value at index 0, because 8 appears in the
- position with 0 elements before it.
-
- @cindex arrays, definition of
- @cindex associative arrays
- Arrays in @code{awk} are different: they are @dfn{associative}. This means
- that each array is a collection of pairs: an index, and its corresponding
- array element value:
-
- @example
- @r{Element} 4 @r{Value} 30
- @r{Element} 2 @r{Value} "foo"
- @r{Element} 1 @r{Value} 8
- @r{Element} 3 @r{Value} ""
- @end example
-
- @noindent
- We have shown the pairs in jumbled order because their order doesn't
- mean anything.
-
- One advantage of an associative array is that new pairs can be added
- at any time. For example, suppose we add to that array a tenth element
- whose value is @w{@code{"number ten"}}. The result is this:
-
- @example
- @r{Element} 10 @r{Value} "number ten"
- @r{Element} 4 @r{Value} 30
- @r{Element} 2 @r{Value} "foo"
- @r{Element} 1 @r{Value} 8
- @r{Element} 3 @r{Value} ""
- @end example
-
- @noindent
- Now the array is @dfn{sparse} (i.e., some indices are missing): it has
- elements 4 and 10, but doesn't have elements 5, 6, 7, 8, or 9.@refill
-
- Another consequence of associative arrays is that the indices don't
- have to be positive integers. Any number, or even a string, can be
- an index. For example, here is an array which translates words from
- English into French:
-
- @example
- @r{Element} "dog" @r{Value} "chien"
- @r{Element} "cat" @r{Value} "chat"
- @r{Element} "one" @r{Value} "un"
- @r{Element} 1 @r{Value} "un"
- @end example
-
- @noindent
- Here we decided to translate the number 1 in both spelled-out and
- numeric form---thus illustrating that a single array can have both
- numbers and strings as indices.
-
- When @code{awk} creates an array for you, e.g., with the @code{split}
- built-in function (@pxref{String Functions}), that array's indices
- are consecutive integers starting at 1.
-
- @node Reference to Elements, Assigning Elements, Array Intro, Arrays
- @section Referring to an Array Element
- @cindex array reference
- @cindex element of array
- @cindex reference to array
-
- The principal way of using an array is to refer to one of its elements.
- An array reference is an expression which looks like this:
-
- @example
- @var{array}[@var{index}]
- @end example
-
- @noindent
- Here @var{array} is the name of an array. The expression @var{index} is
- the index of the element of the array that you want.
-
- The value of the array reference is the current value of that array
- element. For example, @code{foo[4.3]} is an expression for the element
- of array @code{foo} at index 4.3.
-
- If you refer to an array element that has no recorded value, the value
- of the reference is @code{""}, the null string. This includes elements
- to which you have not assigned any value, and elements that have been
- deleted (@pxref{Delete}). Such a reference automatically creates that
- array element, with the null string as its value. (In some cases,
- this is unfortunate, because it might waste memory inside @code{awk}).
-
- @cindex arrays, determining presence of elements
- You can find out if an element exists in an array at a certain index with
- the expression:
-
- @example
- @var{index} in @var{array}
- @end example
-
- @noindent
- This expression tests whether or not the particular index exists,
- without the side effect of creating that element if it is not present.
- The expression has the value 1 (true) if @code{@var{array}[@var{index}]}
- exists, and 0 (false) if it does not exist.@refill
-
- For example, to test whether the array @code{frequencies} contains the
- index @code{"2"}, you could write this statement:@refill
-
- @example
- if ("2" in frequencies) print "Subscript \"2\" is present."
- @end example
-
- Note that this is @emph{not} a test of whether or not the array
- @code{frequencies} contains an element whose @emph{value} is @code{"2"}.
- (There is no way to do that except to scan all the elements.) Also, this
- @emph{does not} create @code{frequencies["2"]}, while the following
- (incorrect) alternative would do so:@refill
-
- @example
- if (frequencies["2"] != "") print "Subscript \"2\" is present."
- @end example
-
- @node Assigning Elements, Array Example, Reference to Elements, Arrays
- @section Assigning Array Elements
- @cindex array assignment
- @cindex element assignment
-
- Array elements are lvalues: they can be assigned values just like
- @code{awk} variables:
-
- @example
- @var{array}[@var{subscript}] = @var{value}
- @end example
-
- @noindent
- Here @var{array} is the name of your array. The expression
- @var{subscript} is the index of the element of the array that you want
- to assign a value. The expression @var{value} is the value you are
- assigning to that element of the array.@refill
-
- @node Array Example, Scanning an Array, Assigning Elements, Arrays
- @section Basic Example of an Array
-
- The following program takes a list of lines, each beginning with a line
- number, and prints them out in order of line number. The line numbers are
- not in order, however, when they are first read: they are scrambled. This
- program sorts the lines by making an array using the line numbers as
- subscripts. It then prints out the lines in sorted order of their numbers.
- It is a very simple program, and gets confused if it encounters repeated
- numbers, gaps, or lines that don't begin with a number.@refill
-
- @example
- @{
- if ($1 > max)
- max = $1
- arr[$1] = $0
- @}
-
- END @{
- for (x = 1; x <= max; x++)
- print arr[x]
- @}
- @end example
-
- @ignore
- The first rule just initializes the variable @code{max}. (This is not
- strictly necessary, since an uninitialized variable has the null string
- as its value, and the null string is effectively zero when used in
- a context where a number is required.)
- @end ignore
-
- The first rule keeps track of the largest line number seen so far;
- it also stores each line into the array @code{arr}, at an index that
- is the line's number.
-
- The second rule runs after all the input has been read, to print out
- all the lines.
-
- When this program is run with the following input:
-
- @example
- 5 I am the Five man
- 2 Who are you? The new number two!
- 4 . . . And four on the floor
- 1 Who is number one?
- 3 I three you.
- @end example
-
- @noindent
- its output is this:
-
- @example
- 1 Who is number one?
- 2 Who are you? The new number two!
- 3 I three you.
- 4 . . . And four on the floor
- 5 I am the Five man
- @end example
-
-