home *** CD-ROM | disk | FTP | other *** search
Text File | 1990-06-07 | 48.5 KB | 1,364 lines |
- If a line number is repeated, the last line with a given number overrides
- the others.
-
- Gaps in the line numbers can be handled with an easy improvement to the
- program's @code{END} rule:
-
- @example
- END @{
- for (x = 1; x <= max; x++)
- if (x in arr)
- print arr[x]
- @}
- @end example
-
- @node Scanning an Array, Delete, Array Example, Arrays
- @section Scanning All Elements of an Array
- @cindex @code{for (x in @dots{})}
- @cindex arrays, special @code{for} statement
- @cindex scanning an array
-
- In programs that use arrays, often you need a loop that executes
- once for each element of an array. In other languages, where arrays are
- contiguous and indices are limited to positive integers, this is
- easy: the largest index is one less than the length of the array, and you can
- find all the valid indices by counting from zero up to that value. This
- technique won't do the job in @code{awk}, since any number or string
- may be an array index. So @code{awk} has a special kind of @code{for}
- statement for scanning an array:
-
- @example
- for (@var{var} in @var{array})
- @var{body}
- @end example
-
- @noindent
- This loop executes @var{body} once for each different value that your
- program has previously used as an index in @var{array}, with the
- variable @var{var} set to that index.@refill
-
- Here is a program that uses this form of the @code{for} statement. The
- first rule scans the input records and notes which words appear (at
- least once) in the input, by storing a 1 into the array @code{used} with
- the word as index. The second rule scans the elements of @code{used} to
- find all the distinct words that appear in the input. It prints each
- word that is more than 10 characters long, and also prints the number of
- such words. @xref{Built-in}, for more information on the built-in
- function @code{length}.
-
- @example
- # Record a 1 for each word that is used at least once.
- @{
- for (i = 0; i < NF; i++)
- used[$i] = 1
- @}
-
- # Find number of distinct words more than 10 characters long.
- END @{
- num_long_words = 0
- for (x in used)
- if (length(x) > 10) @{
- ++num_long_words
- print x
- @}
- print num_long_words, "words longer than 10 characters"
- @}
- @end example
-
- @noindent
- @xref{Sample Program}, for a more detailed example of this type.
-
- The order in which elements of the array are accessed by this statement
- is determined by the internal arrangement of the array elements within
- @code{awk} and cannot be controlled or changed. This can lead to
- problems if new elements are added to @var{array} by statements in
- @var{body}; you cannot predict whether or not the @code{for} loop will
- reach them. Similarly, changing @var{var} inside the loop can produce
- strange results. It is best to avoid such things.@refill
-
- @node Delete, Multi-dimensional, Scanning an Array, Arrays
- @section The @code{delete} Statement
- @cindex @code{delete} statement
- @cindex deleting elements of arrays
- @cindex removing elements of arrays
- @cindex arrays, deleting an element
-
- You can remove an individual element of an array using the @code{delete}
- statement:
-
- @example
- delete @var{array}[@var{index}]
- @end example
-
- When an array element is deleted, it is as if you had never referred to it
- and had never given it any value. Any value the element formerly had
- can no longer be obtained.
-
- Here is an example of deleting elements in an array:
-
- @example
- for (i in frequencies)
- delete frequencies[i]
- @end example
-
- @noindent
- This example removes all the elements from the array @code{frequencies}.
-
- If you delete an element, a subsequent @code{for} statement to scan the array
- will not report that element, and the @code{in} operator to check for
- the presence of that element will return 0:
-
- @example
- delete foo[4]
- if (4 in foo)
- print "This will never be printed"
- @end example
-
- @node Multi-dimensional, Multi-scanning, Delete, Arrays
- @section Multi-dimensional Arrays
-
- @cindex subscripts, multi-dimensional in arrays
- @cindex arrays, multi-dimensional subscripts
- @cindex multi-dimensional subscripts
- A multi-dimensional array is an array in which an element is identified
- by a sequence of indices, not a single index. For example, a
- two-dimensional array requires two indices. The usual way (in most
- languages, including @code{awk}) to refer to an element of a
- two-dimensional array named @code{grid} is with
- @code{grid[@var{x},@var{y}]}.
-
- @vindex SUBSEP
- Multi-dimensional arrays are supported in @code{awk} through
- concatenation of indices into one string. What happens is that
- @code{awk} converts the indices into strings (@pxref{Conversion}) and
- concatenates them together, with a separator between them. This creates
- a single string that describes the values of the separate indices. The
- combined string is used as a single index into an ordinary,
- one-dimensional array. The separator used is the value of the built-in
- variable @code{SUBSEP}.
-
- For example, suppose we evaluate the expression @code{foo[5,12]="value"}
- when the value of @code{SUBSEP} is @code{"@@"}. The numbers 5 and 12 are
- concatenated with a comma between them, yielding @code{"5@@12"}; thus,
- the array element @code{foo["5@@12"]} is set to @code{"value"}.
-
- Once the element's value is stored, @code{awk} has no record of whether
- it was stored with a single index or a sequence of indices. The two
- expressions @code{foo[5,12]} and @w{@code{foo[5 SUBSEP 12]}} always have
- the same value.
-
- The default value of @code{SUBSEP} is actually the string @code{"\034"},
- which contains a nonprinting character that is unlikely to appear in an
- @code{awk} program or in the input data.
-
- The usefulness of choosing an unlikely character comes from the fact
- that index values that contain a string matching @code{SUBSEP} lead to
- combined strings that are ambiguous. Suppose that @code{SUBSEP} were
- @code{"@@"}; then @w{@code{foo["a@@b", "c"]}} and @w{@code{foo["a",
- "b@@c"]}} would be indistinguishable because both would actually be
- stored as @code{foo["a@@b@@c"]}. Because @code{SUBSEP} is
- @code{"\034"}, such confusion can actually happen only when an index
- contains the character with ASCII code 034, which is a rare
- event.@refill
-
- You can test whether a particular index-sequence exists in a
- ``multi-dimensional'' array with the same operator @code{in} used for single
- dimensional arrays. Instead of a single index as the left-hand operand,
- write the whole sequence of indices, separated by commas, in
- parentheses:@refill
-
- @example
- (@var{subscript1}, @var{subscript2}, @dots{}) in @var{array}
- @end example
-
- The following example treats its input as a two-dimensional array of
- fields; it rotates this array 90 degrees clockwise and prints the
- result. It assumes that all lines have the same number of
- elements.
-
- @example
- awk '@{
- if (max_nf < NF)
- max_nf = NF
- max_nr = NR
- for (x = 1; x <= NF; x++)
- vector[x, NR] = $x
- @}
-
- END @{
- for (x = 1; x <= max_nf; x++) @{
- for (y = max_nr; y >= 1; --y)
- printf("%s ", vector[x, y])
- printf("\n")
- @}
- @}'
- @end example
-
- @noindent
- When given the input:
-
- @example
- 1 2 3 4 5 6
- 2 3 4 5 6 1
- 3 4 5 6 1 2
- 4 5 6 1 2 3
- @end example
-
- @noindent
- it produces:
-
- @example
- 4 3 2 1
- 5 4 3 2
- 6 5 4 3
- 1 6 5 4
- 2 1 6 5
- 3 2 1 6
- @end example
-
- @node Multi-scanning, , Multi-dimensional, Arrays
- @section Scanning Multi-dimensional Arrays
-
- There is no special @code{for} statement for scanning a
- ``multi-dimensional'' array; there cannot be one, because in truth there
- are no multi-dimensional arrays or elements; there is only a
- multi-dimensional @emph{way of accessing} an array.
-
- However, if your program has an array that is always accessed as
- multi-dimensional, you can get the effect of scanning it by combining
- the scanning @code{for} statement (@pxref{Scanning an Array}) with the
- @code{split} built-in function (@pxref{String Functions}). It works
- like this:
-
- @example
- for (combined in @var{array}) @{
- split(combined, separate, SUBSEP)
- @dots{}
- @}
- @end example
-
- @noindent
- This finds each concatenated, combined index in the array, and splits it
- into the individual indices by breaking it apart where the value of
- @code{SUBSEP} appears. The split-out indices become the elements of
- the array @code{separate}.
-
- Thus, suppose you have previously stored in @code{@var{array}[1,
- "foo"]}; then an element with index @code{"1\034foo"} exists in
- @var{array}. (Recall that the default value of @code{SUBSEP} contains
- the character with code 034.) Sooner or later the @code{for} statement
- will find that index and do an iteration with @code{combined} set to
- @code{"1\034foo"}. Then the @code{split} function is called as
- follows:
-
- @example
- split("1\034foo", separate, "\034")
- @end example
-
- @noindent
- The result of this is to set @code{separate[1]} to 1 and @code{separate[2]}
- to @code{"foo"}. Presto, the original sequence of separate indices has
- been recovered.
-
- @node Built-in, User-defined, Arrays, Top
- @chapter Built-in Functions
-
- @cindex built-in functions
- @dfn{Built-in} functions are functions that are always available for
- your @code{awk} program to call. This chapter defines all the built-in
- functions in @code{awk}; some of them are mentioned in other sections,
- but they are summarized here for your convenience. (You can also define
- new functions yourself. @xref{User-defined}.)
-
- @menu
- * Calling Built-in:: How to call built-in functions.
-
- * Numeric Functions:: Functions that work with numbers,
- including @code{int}, @code{sin} and @code{rand}.
-
- * String Functions:: Functions for string manipulation,
- such as @code{split}, @code{match}, and @code{sprintf}.
-
- * I/O Functions:: Functions for files and shell commands
- @end menu
-
- @node Calling Built-in, Numeric Functions, Built-in, Built-in
- @section Calling Built-in Functions
-
- To call a built-in function, write the name of the function followed
- by arguments in parentheses. For example, @code{atan2(y + z, 1)}
- is a call to the function @code{atan2}, with two arguments.
-
- Whitespace is ignored between the built-in function name and the
- open-parenthesis, but we recommend that you avoid using whitespace
- there. User-defined functions do not permit whitespace in this way, and
- you will find it easier to avoid mistakes by following a simple
- convention which always works: no whitespace after a function name.
-
- Each built-in function accepts a certain number of arguments. In most
- cases, any extra arguments given to built-in functions are ignored. The
- defaults for omitted arguments vary from function to function and are
- described under the individual functions.
-
- When a function is called, expressions that create the function's actual
- parameters are evaluated completely before the function call is performed.
- For example, in the code fragment:
-
- @example
- i = 4
- j = sqrt(i++)
- @end example
-
- @noindent
- the variable @code{i} is set to 5 before @code{sqrt} is called
- with a value of 4 for its actual parameter.
-
- @node Numeric Functions, String Functions, Calling Built-in, Built-in
- @section Numeric Built-in Functions
-
- Here is a full list of built-in functions that work with numbers:
-
- @table @code
- @item int(@var{x})
- This gives you the integer part of @var{x}, truncated toward 0. This
- produces the nearest integer to @var{x}, located between @var{x} and 0.
-
- For example, @code{int(3)} is 3, @code{int(3.9)} is 3, @code{int(-3.9)}
- is @minus{}3, and @code{int(-3)} is @minus{}3 as well.@refill
-
- @item sqrt(@var{x})
- This gives you the positive square root of @var{x}. It reports an error
- if @var{x} is negative. Thus, @code{sqrt(4)} is 2.@refill
-
- @item exp(@var{x})
- This gives you the exponential of @var{x}, or reports an error if
- @var{x} is out of range. The range of values @var{x} can have depends
- on your machine's floating point representation.@refill
-
- @item log(@var{x})
- This gives you the natural logarithm of @var{x}, if @var{x} is positive;
- otherwise, it reports an error.@refill
-
- @item sin(@var{x})
- This gives you the sine of @var{x}, with @var{x} in radians.
-
- @item cos(@var{x})
- This gives you the cosine of @var{x}, with @var{x} in radians.
-
- @item atan2(@var{y}, @var{x})
- This gives you the arctangent of @code{@var{y} / @var{x}}, with the
- quotient understood in radians.
-
- @item rand()
- This gives you a random number. The values of @code{rand} are
- uniformly-distributed between 0 and 1. The value is never 0 and never
- 1.
-
- Often you want random integers instead. Here is a user-defined function
- you can use to obtain a random nonnegative integer less than @var{n}:
-
- @example
- function randint(n) @{
- return int(n * rand())
- @}
- @end example
-
- @noindent
- The multiplication produces a random real number greater than 0 and less
- than @var{n}. We then make it an integer (using @code{int}) between 0
- and @code{@var{n} @minus{} 1}.
-
- Here is an example where a similar function is used to produce
- random integers between 1 and @var{n}:
-
- @example
- awk '
- # Function to roll a simulated die.
- function roll(n) @{ return 1 + int(rand() * n) @}
-
- # Roll 3 six-sided dice and print total number of points.
- @{
- printf("%d points\n", roll(6)+roll(6)+roll(6))
- @}'
- @end example
-
- @strong{Note:} @code{rand} starts generating numbers from the same
- point, or @dfn{seed}, each time you run @code{awk}. This means that
- a program will produce the same results each time you run it.
- The numbers are random within one @code{awk} run, but predictable
- from run to run. This is convenient for debugging, but if you want
- a program to do different things each time it is used, you must change
- the seed to a value that will be different in each run. To do this,
- use @code{srand}.
-
- @item srand(@var{x})
- The function @code{srand} sets the starting point, or @dfn{seed},
- for generating random numbers to the value @var{x}.
-
- Each seed value leads to a particular sequence of ``random'' numbers.
- Thus, if you set the seed to the same value a second time, you will get
- the same sequence of ``random'' numbers again.
-
- If you omit the argument @var{x}, as in @code{srand()}, then the current
- date and time of day are used for a seed. This is the way to get random
- numbers that are truly unpredictable.
-
- The return value of @code{srand} is the previous seed. This makes it
- easy to keep track of the seeds for use in consistently reproducing
- sequences of random numbers.
- @end table
-
- @node String Functions, I/O Functions, Numeric Functions, Built-in
- @section Built-in Functions for String Manipulation
-
- The functions in this section look at the text of one or more
- strings.
-
- @table @code
- @item index(@var{in}, @var{find})
- @findex match
- This searches the string @var{in} for the first occurrence of the string
- @var{find}, and returns the position where that occurrence begins in the
- string @var{in}. For example:@refill
-
- @example
- awk 'BEGIN @{ print index("peanut", "an") @}'
- @end example
-
- @noindent
- prints @samp{3}. If @var{find} is not found, @code{index} returns 0.
-
- @item length(@var{string})
- @findex length
- This gives you the number of characters in @var{string}. If
- @var{string} is a number, the length of the digit string representing
- that number is returned. For example, @code{length("abcde")} is 5. By
- contrast, @code{length(15 * 35)} works out to 3. How? Well, 15 * 35 =
- 525, and 525 is then converted to the string @samp{"525"}, which has
- three characters.
-
- If no argument is supplied, @code{length} returns the length of @code{$0}.
-
- @item match(@var{string}, @var{regexp})
- @findex match
- The @code{match} function searches the string, @var{string}, for the
- longest, leftmost substring matched by the regular expression,
- @var{regexp}. It returns the character position, or @dfn{index}, of
- where that substring begins (1, if it starts at the beginning of
- @var{string}). If no match if found, it returns 0.
-
- @vindex RSTART
- @vindex RLENGTH
- The @code{match} function sets the built-in variable @code{RSTART} to
- the index. It also sets the built-in variable @code{RLENGTH} to the
- length of the matched substring. If no match is found, @code{RSTART}
- is set to 0, and @code{RLENGTH} to @minus{}1.
-
- For example:
-
- @example
- awk '@{
- if ($1 == "FIND")
- regex = $2
- else @{
- where = match($0, regex)
- if (where)
- print "Match of", regex, "found at", where, "in", $0
- @}
- @}'
- @end example
-
- @noindent
- This program looks for lines that match the regular expression stored in
- the variable @code{regex}. This regular expression can be changed. If the
- first word on a line is @samp{FIND}, @code{regex} is changed to be the
- second word on that line. Therefore, given:
-
- @example
- FIND fo*bar
- My program was a foobar
- But none of it would doobar
- FIND Melvin
- JF+KM
- This line is property of The Reality Engineering Co.
- This file created by Melvin.
- @end example
-
- @noindent
- @code{awk} prints:
-
- @example
- Match of fo*bar found at 18 in My program was a foobar
- Match of Melvin found at 26 in This file created by Melvin.
- @end example
-
- @item split(@var{string}, @var{array}, @var{fieldsep})
- @findex split
- This divides @var{string} up into pieces separated by @var{fieldsep},
- and stores the pieces in @var{array}. The first piece is stored in
- @code{@var{array}[1]}, the second piece in @code{@var{array}[2]}, and so
- forth. The string value of the third argument, @var{fieldsep}, is used
- as a regexp to search for to find the places to split @var{string}. If
- the @var{fieldsep} is omitted, the value of @code{FS} is used.
- @code{split} returns the number of elements created.@refill
-
- The @code{split} function, then, splits strings into pieces in a
- manner similar to the way input lines are split into fields. For example:
-
- @example
- split("auto-da-fe", a, "-")
- @end example
-
- @noindent
- splits the string @samp{auto-da-fe} into three fields using @samp{-} as the
- separator. It sets the contents of the array @code{a} as follows:
-
- @example
- a[1] = "auto"
- a[2] = "da"
- a[3] = "fe"
- @end example
-
- @noindent
- The value returned by this call to @code{split} is 3.
-
- @item sprintf(@var{format}, @var{expression1},@dots{})
- @findex sprintf
- This returns (without printing) the string that @code{printf} would
- have printed out with the same arguments (@pxref{Printf}). For
- example:
-
- @example
- sprintf("pi = %.2f (approx.)", 22/7)
- @end example
-
- @noindent
- returns the string @w{@code{"pi = 3.14 (approx.)"}}.
-
- @item sub(@var{regexp}, @var{replacement}, @var{target})
- @findex sub
- The @code{sub} function alters the value of @var{target}.
- It searches this value, which should be a string, for the
- leftmost substring matched by the regular expression, @var{regexp},
- extending this match as far as possible. Then the entire string is
- changed by replacing the matched text with @var{replacement}.
- The modified string becomes the new value of @var{target}.
-
- This function is peculiar because @var{target} is not simply
- used to compute a value, and not just any expression will do: it
- must be a variable, field or array reference, so that @code{sub} can
- store a modified value there. If this argument is omitted, then the
- default is to use and alter @code{$0}.
-
- For example:@refill
-
- @example
- str = "water, water, everywhere"
- sub(/at/, "ith", str)
- @end example
-
- @noindent
- sets @code{str} to @w{@code{"wither, water, everywhere"}}, by replacing the
- leftmost, longest occurrence of @samp{at} with @samp{ith}.
-
- The @code{sub} function returns the number of substitutions made (either
- one or zero).
-
- If the special character @samp{&} appears in @var{replacement}, it
- stands for the precise substring that was matched by @var{regexp}. (If
- the regexp can match more than one string, then this precise substring
- may vary.) For example:@refill
-
- @example
- awk '@{ sub(/candidate/, "& and his wife"); print @}'
- @end example
-
- @noindent
- changes the first occurrence of @samp{candidate} to @samp{candidate
- and his wife} on each input line.
-
- The effect of this special character can be turned off by putting a
- backslash before it in the string. As usual, to insert one backslash in
- the string, you must write two backslashes. Therefore, write @samp{\\&}
- in a string constant to include a literal @samp{&} in the replacement.
- For example, here is how to replace the first @samp{|} on each line with
- an @samp{&}:@refill
-
- @example
- awk '@{ sub(/\|/, "\\&"); print @}'
- @end example
-
- @strong{Note:} as mentioned above, the third argument to @code{sub} must
- be an lvalue. Some versions of @code{awk} allow the third argument to
- be an expression which is not an lvalue. In such a case, @code{sub}
- would still search for the pattern and return 0 or 1, but the result of
- the substitution (if any) would be thrown away because there is no place
- to put it. Such versions of @code{awk} accept expressions like
- this:@refill
-
- @example
- sub(/USA/, "United States", "the USA and Canada")
- @end example
-
- @noindent
- But that is considered erroneous in @code{gawk}.
-
- @item gsub(@var{regexp}, @var{replacement}, @var{target})
- @findex gsub
- This is similar to the @code{sub} function, except @code{gsub} replaces
- @emph{all} of the longest, leftmost, @emph{nonoverlapping} matching
- substrings it can find. The @samp{g} in @code{gsub} stands for
- ``global'', which means replace everywhere. For example:@refill
-
- @example
- awk '@{ gsub(/Britain/, "United Kingdom"); print @}'
- @end example
-
- @noindent
- replaces all occurrences of the string @samp{Britain} with @samp{United
- Kingdom} for all input records.@refill
-
- The @code{gsub} function returns the number of substitutions made. If
- the variable to be searched and altered, @var{target}, is
- omitted, then the entire input record, @code{$0}, is used.@refill
-
- As in @code{sub}, the characters @samp{&} and @samp{\} are special, and
- the third argument must be an lvalue.
-
- @item substr(@var{string}, @var{start}, @var{length})
- @findex substr
- This returns a @var{length}-character-long substring of @var{string},
- starting at character number @var{start}. The first character of a
- string is character number one. For example,
- @code{substr("washington", 5, 3)} returns @code{"ing"}.@refill
-
- If @var{length} is not present, this function returns the whole suffix of
- @var{string} that begins at character number @var{start}. For example,
- @code{substr("washington", 5)} returns @code{"ington"}.
-
- @item tolower(@var{string})
- @findex tolower
- This returns a copy of @var{string}, with each upper-case character
- in the string replaced with its corresponding lower-case character.
- Nonalphabetic characters are left unchanged. For example,
- @code{tolower("MiXeD cAsE 123")} returns @code{"mixed case 123"}.
-
- @item toupper(@var{string})
- @findex toupper
- This returns a copy of @var{string}, with each lower-case character
- in the string replaced with its corresponding upper-case character.
- Nonalphabetic characters are left unchanged. For example,
- @code{toupper("MiXeD cAsE 123")} returns @code{"MIXED CASE 123"}.
- @end table
-
- @node I/O Functions, , String Functions, Built-in
- @section Built-in Functions For Input/Output
-
- @table @code
- @item close(@var{filename})
- Close the file @var{filename}, for input or output. The argument may
- alternatively be a shell command that was used for redirecting to or
- from a pipe; then the pipe is closed.
-
- @xref{Close Input}, regarding closing input files and pipes.
- @xref{Close Output}, regarding closing output files and pipes.
-
- @item system(@var{command})
- @findex system
- @cindex interaction of @code{awk} with other programs
- The system function allows the user to execute operating system commands
- and then return to the @code{awk} program. The @code{system} function
- executes the command given by the string @var{command}. It returns, as
- its value, the status returned by the command that was executed.
-
- For example, if the following fragment of code is put in your @code{awk}
- program:
-
- @example
- END @{
- system("mail -s 'awk run done' operator < /dev/null")
- @}
- @end example
-
- @noindent
- the system operator will be sent mail when the @code{awk} program
- finishes processing input and begins its end-of-input processing.
-
- Note that much the same result can be obtained by redirecting
- @code{print} or @code{printf} into a pipe. However, if your @code{awk}
- program is interactive, @code{system} is useful for cranking up large
- self-contained programs, such as a shell or an editor.@refill
-
- Some operating systems cannot implement the @code{system} function.
- @code{system} causes a fatal error if it is not supported.
- @end table
-
- @node User-defined, Built-in Variables, Built-in, Top
- @chapter User-defined Functions
-
- @cindex user-defined functions
- @cindex functions, user-defined
- Complicated @code{awk} programs can often be simplified by defining
- your own functions. User-defined functions can be called just like
- built-in ones (@pxref{Function Calls}), but it is up to you to define
- them---to tell @code{awk} what they should do.
-
- @menu
- * Definition Syntax:: How to write definitions and what they mean.
- * Function Example:: An example function definition and what it does.
- * Function Caveats:: Things to watch out for.
- * Return Statement:: Specifying the value a function returns.
- @end menu
-
- @node Definition Syntax, Function Example, User-defined, User-defined
- @section Syntax of Function Definitions
- @cindex defining functions
- @cindex function definition
-
- Definitions of functions can appear anywhere between the rules of the
- @code{awk} program. Thus, the general form of an @code{awk} program is
- extended to include sequences of rules @emph{and} user-defined function
- definitions.
-
- The definition of a function named @var{name} looks like this:
-
- @example
- function @var{name} (@var{parameter-list}) @{
- @var{body-of-function}
- @}
- @end example
-
- @noindent
- The keyword @code{function} may be abbreviated @code{func}.
-
- @var{name} is the name of the function to be defined. A valid function
- name is like a valid variable name: a sequence of letters, digits and
- underscores, not starting with a digit.
-
- @var{parameter-list} is a list of the function's arguments and local
- variable names, separated by commas. When the function is called,
- the argument names are used to hold the argument values given in
- the call. The local variables are initialized to the null string.
-
- The @var{body-of-function} consists of @code{awk} statements. It is the
- most important part of the definition, because it says what the function
- should actually @emph{do}. The argument names exist to give the body a
- way to talk about the arguments; local variables, to give the body
- places to keep temporary values.
-
- Argument names are not distinguished syntactically from local variable
- names; instead, the number of arguments supplied when the function is
- called determines how many argument variables there are. Thus, if three
- argument values are given, the first three names in @var{parameter-list}
- are arguments, and the rest are local variables.
-
- It follows that if the number of arguments is not the same in all calls
- to the function, some of the names in @var{parameter-list} may be
- arguments on some occasions and local variables on others. Another
- way to think of this is that omitted arguments default to the
- null string.
-
- Usually when you write a function you know how many names you intend to
- use for arguments and how many you intend to use as locals. By
- convention, you should write an extra space between the arguments and
- the locals, so that other people can follow how your function is
- supposed to be used.
-
- During execution of the function body, the arguments and local variable
- values hide or @dfn{shadow} any variables of the same names used in the
- rest of the program. The shadowed variables are not accessible in the
- function definition, because there is no way to name them while their
- names have been taken away for the local variables. All other variables
- used in the @code{awk} program can be referenced or set normally in the
- function definition.
-
- The arguments and local variables last only as long as the function body
- is executing. Once the body finishes, the shadowed variables come back.
-
- The function body can contain expressions which call functions. They
- can even call this function, either directly or by way of another
- function. When this happens, we say the function is @dfn{recursive}.
-
- There is no need in @code{awk} to put the definition of a function
- before all uses of the function. This is because @code{awk} reads the
- entire program before starting to execute any of it.
-
- @node Function Example, Function Caveats, Definition Syntax, User-defined
- @section Function Definition Example
-
- Here is an example of a user-defined function, called @code{myprint}, that
- takes a number and prints it in a specific format.
-
- @example
- function myprint(num)
- @{
- printf "%6.3g\n", num
- @}
- @end example
-
- @noindent
- To illustrate, here is an @code{awk} rule which uses our @code{myprint}
- function:
-
- @example
- $3 > 0 @{ myprint($3) @}
- @end example
-
- @noindent
- This program prints, in our special format, all the third fields that
- contain a positive number in our input. Therefore, when given:
-
- @example
- 1.2 3.4 5.6 7.8
- 9.10 11.12 13.14 15.16
- 17.18 19.20 21.22 23.24
- @end example
-
- @noindent
- this program, using our function to format the results, prints:
-
- @example
- 5.6
- 13.1
- 21.2
- @end example
-
- Here is a rather contrived example of a recursive function. It prints a
- string backwards:
-
- @example
- function rev (str, len) @{
- if (len == 0) @{
- printf "\n"
- return
- @}
- printf "%c", substr(str, len, 1)
- rev(str, len - 1)
- @}
- @end example
-
- @node Function Caveats, Return Statement, Function Example, User-defined
- @section Calling User-defined Functions
-
- @dfn{Calling a function} means causing the function to run and do its job.
- A function call is an expression, and its value is the value returned by
- the function.
-
- A function call consists of the function name followed by the arguments
- in parentheses. What you write in the call for the arguments are
- @code{awk} expressions; each time the call is executed, these
- expressions are evaluated, and the values are the actual arguments. For
- example, here is a call to @code{foo} with three arguments:
-
- @example
- foo(x y, "lose", 4 * z)
- @end example
-
- @strong{Note:} whitespace characters (spaces and tabs) are not allowed
- between the function name and the open-parenthesis of the argument list.
- If you write whitespace by mistake, @code{awk} might think that you mean
- to concatenate a variable with an expression in parentheses. However, it
- notices that you used a function name and not a variable name, and reports
- an error.
-
- @cindex call by value
- When a function is called, it is given a @emph{copy} of the values of
- its arguments. This is called @dfn{call by value}. The caller may use
- a variable as the expression for the argument, but the called function
- does not know this: all it knows is what value the argument had. For
- example, if you write this code:
-
- @example
- foo = "bar"
- z = myfunc(foo)
- @end example
-
- @noindent
- then you should not think of the argument to @code{myfunc} as being
- ``the variable @code{foo}''. Instead, think of the argument as the
- string value, @code{"bar"}.
-
- If the function @code{myfunc} alters the values of its local variables,
- this has no effect on any other variables. In particular, if @code{myfunc}
- does this:
-
- @example
- function myfunc (win) @{
- print win
- win = "zzz"
- print win
- @}
- @end example
-
- @noindent
- to change its first argument variable @code{win}, this @emph{does not}
- change the value of @code{foo} in the caller. The role of @code{foo} in
- calling @code{myfunc} ended when its value, @code{"bar"}, was computed.
- If @code{win} also exists outside of @code{myfunc}, the function body
- cannot alter this outer value, because it is shadowed during the
- execution of @code{myfunc} and cannot be seen or changed from there.
-
- @cindex call by reference
- However, when arrays are the parameters to functions, they are @emph{not}
- copied. Instead, the array itself is made available for direct manipulation
- by the function. This is usually called @dfn{call by reference}.
- Changes made to an array parameter inside the body of a function @emph{are}
- visible outside that function. @emph{This can be very dangerous if you don't
- watch what you are doing.} For example:@refill
-
- @example
- function changeit (array, ind, nvalue) @{
- array[ind] = nvalue
- @}
-
- BEGIN @{
- a[1] = 1 ; a[2] = 2 ; a[3] = 3
- changeit(a, 2, "two")
- printf "a[1] = %s, a[2] = %s, a[3] = %s\n", a[1], a[2], a[3]
- @}
- @end example
-
- @noindent
- prints @samp{a[1] = 1, a[2] = two, a[3] = 3}, because calling
- @code{changeit} stores @code{"two"} in the second element of @code{a}.
-
- @node Return Statement, , Function Caveats, User-defined
- @section The @code{return} Statement
- @cindex @code{return} statement
-
- The body of a user-defined function can contain a @code{return} statement.
- This statement returns control to the rest of the @code{awk} program. It
- can also be used to return a value for use in the rest of the @code{awk}
- program. It looks like this:@refill
-
- @example
- return @var{expression}
- @end example
-
- The @var{expression} part is optional. If it is omitted, then the returned
- value is undefined and, therefore, unpredictable.
-
- A @code{return} statement with no value expression is assumed at the end of
- every function definition. So if control reaches the end of the function
- definition, then the function returns an unpredictable value.
-
- Here is an example of a user-defined function that returns a value
- for the largest number among the elements of an array:@refill
-
- @example
- function maxelt (vec, i, ret) @{
- for (i in vec) @{
- if (ret == "" || vec[i] > ret)
- ret = vec[i]
- @}
- return ret
- @}
- @end example
-
- @noindent
- You call @code{maxelt} with one argument, an array name. The local
- variables @code{i} and @code{ret} are not intended to be arguments;
- while there is nothing to stop you from passing two or three arguments
- to @code{maxelt}, the results would be strange. The extra space before
- @code{i} in the function parameter list is to indicate that @code{i} and
- @code{ret} are not supposed to be arguments. This is a convention which
- you should follow when you define functions.
-
- Here is a program that uses our @code{maxelt} function. It loads an
- array, calls @code{maxelt}, and then reports the maximum number in that
- array:@refill
-
- @example
- awk '
- function maxelt (vec, i, ret) @{
- for (i in vec) @{
- if (ret == "" || vec[i] > ret)
- ret = vec[i]
- @}
- return ret
- @}
-
- # Load all fields of each record into nums.
- @{
- for(i = 1; i <= NF; i++)
- nums[NR, i] = $i
- @}
-
- END @{
- print maxelt(nums)
- @}'
- @end example
-
- Given the following input:
-
- @example
- 1 5 23 8 16
- 44 3 5 2 8 26
- 256 291 1396 2962 100
- -6 467 998 1101
- 99385 11 0 225
- @end example
-
- @noindent
- our program tells us (predictably) that:
-
- @example
- 99385
- @end example
-
- @noindent
- is the largest number in our array.
-
- @node Built-in Variables, Command Line, User-defined, Top
- @chapter Built-in Variables
- @cindex built-in variables
-
- Most @code{awk} variables are available for you to use for your own
- purposes; they never change except when your program assigns them, and
- never affect anything except when your program examines them.
-
- A few variables have special built-in meanings. Some of them @code{awk}
- examines automatically, so that they enable you to tell @code{awk} how
- to do certain things. Others are set automatically by @code{awk}, so
- that they carry information from the internal workings of @code{awk} to
- your program.
-
- This chapter documents all the built-in variables of @code{gawk}. Most
- of them are also documented in the chapters where their areas of
- activity are described.
-
- @menu
- * User-modified:: Built-in variables that you change to control @code{awk}.
-
- * Auto-set:: Built-in variables where @code{awk} gives you information.
- @end menu
-
- @node User-modified, Auto-set, Built-in Variables, Built-in Variables
- @section Built-in Variables That Control @code{awk}
- @cindex built-in variables, user modifiable
-
- This is a list of the variables which you can change to control how
- @code{awk} does certain things.
-
- @table @code
- @c it's unadvisable to have multiple index entries for the same name
- @c since in Info there is no way to distinguish the two.
- @c @vindex FS
- @item FS
- @code{FS} is the input field separator (@pxref{Field Separators}).
- The value is a single-character string or a multi-character regular
- expression that matches the separations between fields in an input
- record.
-
- The default value is @w{@code{" "}}, a string consisting of a single
- space. As a special exception, this value actually means that any
- sequence of spaces and tabs is a single separator. It also causes
- spaces and tabs at the beginning or end of a line to be ignored.
-
- You can set the value of @code{FS} on the command line using the
- @samp{-F} option:
-
- @example
- awk -F, '@var{program}' @var{input-files}
- @end example
-
- @item IGNORECASE
- @c @vindex IGNORECASE
- If @code{IGNORECASE} is nonzero, then @emph{all} regular expression
- matching is done in a case-independent fashion. In particular, regexp
- matching with @samp{~} and @samp{!~}, and the @code{gsub} @code{index},
- @code{match}, @code{split} and @code{sub} functions all ignore case when
- doing their particular regexp operations. @strong{Note:} since field
- splitting with the value of the @code{FS} variable is also a regular
- expression operation, that too is done with case ignored.
- @xref{Case-sensitivity}.
-
- If @code{gawk} is in compatibility mode (@pxref{Command Line}), then
- @code{IGNORECASE} has no special meaning, and regexp operations are
- always case-sensitive.@refill
-
- @item OFMT
- @c @vindex OFMT
- This string is used by @code{awk} to control conversion of numbers to
- strings (@pxref{Conversion}). It works by being passed, in effect, as
- the first argument to the @code{sprintf} function. Its default value
- is @code{"%.6g"}.@refill
-
- @item OFS
- @c @vindex OFS
- This is the output field separator (@pxref{Output Separators}). It is
- output between the fields output by a @code{print} statement. Its
- default value is @w{@code{" "}}, a string consisting of a single space.
-
- @item ORS
- @c @vindex ORS
- This is the output record separator. It is output at the end of every
- @code{print} statement. Its default value is a string containing a
- single newline character, which could be written as @code{"\n"}.
- (@xref{Output Separators}).@refill
-
- @item RS
- @c @vindex RS
- This is @code{awk}'s record separator. Its default value is a string
- containing a single newline character, which means that an input record
- consists of a single line of text. (@xref{Records}.)@refill
-
- @item SUBSEP
- @c @vindex SUBSEP
- @code{SUBSEP} is a subscript separator. It has the default value of
- @code{"\034"}, and is used to separate the parts of the name of a
- multi-dimensional array. Thus, if you access @code{foo[12,3]}, it
- really accesses @code{foo["12\0343"]}. (@xref{Multi-dimensional}).@refill
- @end table
-
- @node Auto-set, , User-modified, Built-in Variables
- @section Built-in Variables That Convey Information to You
-
- This is a list of the variables that are set automatically by @code{awk}
- on certain occasions so as to provide information for your program.
-
- @table @code
- @item ARGC
- @itemx ARGV
- @c @vindex ARGC
- @c @vindex ARGV
- The command-line arguments available to @code{awk} are stored in an
- array called @code{ARGV}. @code{ARGC} is the number of command-line
- arguments present. @code{ARGV} is indexed from zero to @w{@code{ARGC - 1}}.
- @xref{Command Line}. For example:
-
- @example
- awk '@{ print ARGV[$1] @}' inventory-shipped BBS-list
- @end example
-
- @noindent
- In this example, @code{ARGV[0]} contains @code{"awk"}, @code{ARGV[1]}
- contains @code{"inventory-shipped"}, and @code{ARGV[2]} contains
- @code{"BBS-list"}. The value of @code{ARGC} is 3, one more than the
- index of the last element in @code{ARGV} since the elements are numbered
- from zero.@refill
-
- Notice that the @code{awk} program is not entered in @code{ARGV}. The
- other special command line options, with their arguments, are also not
- entered. But variable assignments on the command line @emph{are}
- treated as arguments, and do show up in the @code{ARGV} array.
-
- Your program can alter @code{ARGC} and the elements of @code{ARGV}.
- Each time @code{awk} reaches the end of an input file, it uses the next
- element of @code{ARGV} as the name of the next input file. By storing a
- different string there, your program can change which files are read.
- You can use @code{"-"} to represent the standard input. By storing
- additional elements and incrementing @code{ARGC} you can cause
- additional files to be read.
-
- If you decrease the value of @code{ARGC}, that eliminates input files
- from the end of the list. By recording the old value of @code{ARGC}
- elsewhere, your program can treat the eliminated arguments as
- something other than file names.
-
- To eliminate a file from the middle of the list, store the null string
- (@code{""}) into @code{ARGV} in place of the file's name. As a
- special feature, @code{awk} ignores file names that have been
- replaced with the null string.
-
- @item ENVIRON
- @vindex ENVIRON
- This is an array that contains the values of the environment. The array
- indices are the environment variable names; the values are the values of
- the particular environment variables. For example,
- @code{ENVIRON["HOME"]} might be @file{/u/close}. Changing this array
- does not affect the environment passed on to any programs that
- @code{awk} may spawn via redirection or the @code{system} function.
- (In a future version of @code{gawk}, it may do so.)
-
- Some operating systems may not have environment variables.
- On such systems, the array @code{ENVIRON} is empty.
-
- @item FILENAME
- @c @vindex FILENAME
- This is the name of the file that @code{awk} is currently reading.
- If @code{awk} is reading from the standard input (in other words,
- there are no files listed on the command line),
- @code{FILENAME} is set to @code{"-"}.
- @code{FILENAME} is changed each time a new file is read (@pxref{Reading
- Files}).@refill
-
- @item FNR
- @c @vindex FNR
- @code{FNR} is the current record number in the current file. @code{FNR} is
- incremented each time a new record is read (@pxref{Getline}).
- It is reinitialized to 0 each time a new input file is started.
-
- @item NF
- @c @vindex NF
- @code{NF} is the number of fields in the current input record.
- @code{NF} is set each time a new record is read, when a new field is
- created, or when @code{$0} changes (@pxref{Fields}).@refill
-
- @item NR
- @c @vindex NR
- This is the number of input records @code{awk} has processed since
- the beginning of the program's execution. (@pxref{Records}).
- @code{NR} is set each time a new record is read.@refill
-
- @item RLENGTH
- @c @vindex RLENGTH
- @code{RLENGTH} is the length of the substring matched by the
- @code{match} function (@pxref{String Functions}). @code{RLENGTH} is set
- by invoking the @code{match} function. Its value is the length of the
- matched string, or @minus{}1 if no match was found.@refill
-
- @item RSTART
- @c @vindex RSTART
- @code{RSTART} is the start-index of the substring matched by the
- @code{match} function (@pxref{String Functions}). @code{RSTART} is set
- by invoking the @code{match} function. Its value is the position of the
- string where the matched substring starts, or 0 if no match was
- found.@refill
- @end table
-
- @node Command Line, Language History, Built-in Variables, Top
- @c node-name, next, previous, up
- @chapter Invocation of @code{awk}
- @cindex command line
- @cindex invocation of @code{gawk}
- @cindex arguments, command line
- @cindex options, command line
-
- There are two ways to run @code{awk}: with an explicit program, or with
- one or more program files. Here are templates for both of them; items
- enclosed in @samp{@r{[}@dots{}@r{]}} in these templates are optional.
-
- @example
- awk @r{[@code{-F@var{fs}}] [@code{-v @var{var}=@var{val}}] [@code{-V}] [@code{-C}] [@code{-c}] [@code{-a}] [@code{-e}] [@code{--}]} '@var{program}' @var{file} @dots{}
- awk @r{[@code{-F@var{fs}}] @code{-f @var{source-file}} [@code{-f @var{source-file} @dots{}}] [@code{-v @var{var}=@var{val}}] [@code{-V}] [@code{-C}] [@code{-c}] [@code{-a}] [@code{-e}] [@code{--}]} @var{file} @dots{}
- @end example
-
- @menu
- * Options:: Command line options and their meanings.
- * Other Arguments:: Input file names and variable assignments.
- * AWKPATH Variable:: Searching directories for @code{awk} programs.
- @end menu
-
- @node Options, Other Arguments, Command Line, Command Line
- @section Command Line Options
-
- Options begin with a minus sign, and consist of a single character.
- The options and their meanings are as follows:
-
- @table @code
- @item -F@var{fs}
- Sets the @code{FS} variable to @var{fs} (@pxref{Field Separators}).
-
- @item -f @var{source-file}
- Indicates that the @code{awk} program is to be found in @var{source-file}
- instead of in the first non-option argument.
-
- @item -v @var{var}=@var{val}
- @cindex @samp{-v} option
- Sets the variable @var{var} to the value @var{val} @emph{before}
- execution of the program begins. Such variable values are available
- inside the @code{BEGIN} rule (see below for a fuller explanation).
-
- The @samp{-v} option only has room to set one variable, but you can use
- it more than once, setting another variable each time, like this:
- @samp{@w{-v foo=1} @w{-v bar=2}}.
-
- @item -a
- Specifies use of traditional @code{awk} syntax for regular expressions.
- This means that @samp{\} can be used to quote any regular expression
- operators inside of square brackets, just as it can be outside of them.
- This mode is currently the default; the @samp{-a} option is useful in
- shell scripts so that they will not break if the default is changed.
- @xref{Regexp Operators}.
-
- @item -e
- Specifies use of @code{egrep} syntax for regular expressions. This
- means that @samp{\} does not serve as a quoting character inside of
- square brackets; ideosyncratic techniques are needed to include various
- special characters within them. This mode may become the default at
- some time in the future. @xref{Regexp Operators}.
-
- @item -c
- @cindex @samp{-c} option
- Specifies @dfn{compatibility mode}, in which the GNU extensions in
- @code{gawk} are disabled, so that @code{gawk} behaves just like Unix
- @code{awk}. These extensions are noted below, where their usage is
- explained. @xref{Compatibility Mode}.
-
- @item -V
- @cindex @samp{-V} option
- Prints version information for this particular copy of @code{gawk}.
- This is so you can determine if your copy of @code{gawk} is up to date
- with respect to whatever the Free Software Foundation is currently
- distributing. This option may disappear in a future version of @code{gawk}.
-
- @item -C
- @cindex @samp{-C} option
- Prints the short version of the General Public License.
- This option may disappear in a future version of @code{gawk}.
-
- @item --
- Signals the end of the command line options. The following arguments
- are not treated as options even if they begin with @samp{-}. This
- interpretation of @samp{--} follows the POSIX argument parsing
- conventions.
-
- This is useful if you have file names that start with @samp{-},
- or in shell scripts, if you have file names that will be specified
- by the user and that might start with @samp{-}.
- @end table
-
- Any other options are flagged as invalid with a warning message, but
- are otherwise ignored.
-
- In compatibility mode, as a special case, if the value of @var{fs} supplied
- to the @samp{-F} option is @samp{t}, then @code{FS} is set to the tab
- character (@code{"\t"}). Also, the @samp{-C} and @samp{-V} options
- are not recognized.@refill
-
- If the @samp{-f} option is @emph{not} used, then the first non-option
- command line argument is expected to be the program text.
-
- The @samp{-f} option may be used more than once on the command line.
- Then @code{awk} reads its program source from all of the named files, as
- if they had been concatenated together into one big file. This is
- useful for creating libraries of @code{awk} functions. Useful functions
- can be written once, and then retrieved from a standard place, instead
- of having to be included into each individual program. You can still
- type in a program at the terminal and use library functions, by specifying
- @samp{-f /dev/tty}. @code{awk} will read a file from the terminal
- to use as part of the @code{awk} program. After typing your program,
- type @kbd{Control-d} (the end-of-file character) to terminate it.
-
- @node Other Arguments, AWKPATH Variable, Options, Command Line
- @section Other Command Line Arguments
-
- Any additional arguments on the command line are normally treated as
- input files to be processed in the order specified. However, an
- argument that has the form @code{@var{var}=@var{value}}, means to assign
- the value @var{value} to the variable @var{var}---it does not specify a
- file at all.
-
- @vindex ARGV
- All these arguments are made available to your @code{awk} program in the
- @code{ARGV} array (@pxref{Built-in Variables}). Command line options
- and the program text (if present) are omitted from the @code{ARGV}
- array. All other arguments, including variable assignments, are
- included.
-
- The distinction between file name arguments and variable-assignment
- arguments is made when @code{awk} is about to open the next input file.
- At that point in execution, it checks the ``file name'' to see whether
- it is really a variable assignment; if so, @code{awk} sets the variable
- instead of reading a file.
-
- Therefore, the variables actually receive the specified values after all
- previously specified files have been read. In particular, the values of
- variables assigned in this fashion are @emph{not} available inside a
- @code{BEGIN} rule (@pxref{BEGIN/END}), since such rules are run before
- @code{awk} begins scanning the argument list.@refill
-
- In some earlier implementations of @code{awk}, when a variable assignment
- occurred before any file names, the assignment would happen @emph{before}
-