home *** CD-ROM | disk | FTP | other *** search
Text File | 1990-06-07 | 48.5 KB | 1,375 lines |
- The input is read in units called @dfn{records}, and processed by the
- rules one record at a time. By default, each record is one line. Each
- record read is split automatically into @dfn{fields}, to make it more
- convenient for a rule to work on parts of the record under
- consideration.
-
- On rare occasions you will need to use the @code{getline} command,
- which can do explicit input from any number of files (@pxref{Getline}).
-
- @menu
- * Records:: Controlling how data is split into records.
- * Fields:: An introduction to fields.
- * Non-Constant Fields:: Non-constant Field Numbers.
- * Changing Fields:: Changing the Contents of a Field.
- * Field Separators:: The field separator and how to change it.
- * Multiple Line:: Reading multi-line records.
-
- * Getline:: Reading files under explicit program control
- using the @code{getline} function.
-
- * Close Input:: Closing an input file (so you can read from
- the beginning once more).
- @end menu
-
- @node Records, Fields, Reading Files, Reading Files
- @section How Input is Split into Records
-
- @cindex record separator
- The @code{awk} language divides its input into records and fields.
- Records are separated by a character called the @dfn{record separator}.
- By default, the record separator is the newline character. Therefore,
- normally, a record is a line of text.@refill
-
- @c @cindex changing the record separator
- @vindex RS
- Sometimes you may want to use a different character to separate your
- records. You can use different characters by changing the built-in
- variable @code{RS}.
-
- The value of @code{RS} is a string that says how to separate records;
- the default value is @code{"\n"}, the string of just a newline
- character. This is why records are, by default, single lines.
-
- @code{RS} can have any string as its value, but only the first character
- of the string is used as the record separator. The other characters are
- ignored. @code{RS} is exceptional in this regard; @code{awk} uses the
- full value of all its other built-in variables.@refill
-
- @ignore
- Someday this should be true!
-
- The value of @code{RS} is not limited to a one-character string. It can
- be any regular expression (@pxref{Regexp}). In general, each record
- ends at the next string that matches the regular expression; the next
- record starts at the end of the matching string. This general rule is
- actually at work in the usual case, where @code{RS} contains just a
- newline: a record ends at the beginning of the next matching string (the
- next newline in the input) and the following record starts just after
- the end of this string (at the first character of the following line).
- The newline, since it matches @code{RS}, is not part of either record.
- @end ignore
-
- You can change the value of @code{RS} in the @code{awk} program with the
- assignment operator, @samp{=} (@pxref{Assignment Ops}). The new
- record-separator character should be enclosed in quotation marks to make
- a string constant. Often the right time to do this is at the beginning
- of execution, before any input has been processed, so that the very
- first record will be read with the proper separator. To do this, use
- the special @code{BEGIN} pattern (@pxref{BEGIN/END}). For
- example:@refill
-
- @example
- awk 'BEGIN @{ RS = "/" @} ; @{ print $0 @}' BBS-list
- @end example
-
- @noindent
- changes the value of @code{RS} to @code{"/"}, before reading any input.
- This is a string whose first character is a slash; as a result, records
- are separated by slashes. Then the input file is read, and the second
- rule in the @code{awk} program (the action with no pattern) prints each
- record. Since each @code{print} statement adds a newline at the end of
- its output, the effect of this @code{awk} program is to copy the input
- with each slash changed to a newline.
-
- Another way to change the record separator is on the command line,
- using the variable-assignment feature (@pxref{Command Line}).
-
- @example
- awk '@dots{}' RS="/" @var{source-file}
- @end example
-
- @noindent
- This sets @code{RS} to @samp{/} before processing @var{source-file}.
-
- The empty string (a string of no characters) has a special meaning
- as the value of @code{RS}: it means that records are separated only
- by blank lines. @xref{Multiple Line}, for more details.
-
- @cindex number of records, @code{NR} or @code{FNR}
- @vindex NR
- @vindex FNR
- The @code{awk} utility keeps track of the number of records that have
- been read so far from the current input file. This value is stored in a
- built-in variable called @code{FNR}. It is reset to zero when a new
- file is started. Another built-in variable, @code{NR}, is the total
- number of input records read so far from all files. It starts at zero
- but is never automatically reset to zero.
-
- If you change the value of @code{RS} in the middle of an @code{awk} run,
- the new value is used to delimit subsequent records, but the record
- currently being processed (and records already finished) are not
- affected.
-
- @node Fields, Non-Constant Fields, Records, Reading Files
- @section Examining Fields
-
- @cindex examining fields
- @cindex fields
- @cindex accessing fields
- When @code{awk} reads an input record, the record is
- automatically separated or @dfn{parsed} by the interpreter into pieces
- called @dfn{fields}. By default, fields are separated by whitespace,
- like words in a line.
- Whitespace in @code{awk} means any string of one or more spaces and/or
- tabs; other characters such as newline, formfeed, and so on, that are
- considered whitespace by other languages are @emph{not} considered
- whitespace by @code{awk}.
-
- The purpose of fields is to make it more convenient for you to refer to
- these pieces of the record. You don't have to use them---you can
- operate on the whole record if you wish---but fields are what make
- simple @code{awk} programs so powerful.
-
- @cindex @code{$} (field operator)
- @cindex operators, @code{$}
- To refer to a field in an @code{awk} program, you use a dollar-sign,
- @samp{$}, followed by the number of the field you want. Thus, @code{$1}
- refers to the first field, @code{$2} to the second, and so on. For
- example, suppose the following is a line of input:@refill
-
- @example
- This seems like a pretty nice example.
- @end example
-
- @noindent
- Here the first field, or @code{$1}, is @samp{This}; the second field, or
- @code{$2}, is @samp{seems}; and so on. Note that the last field,
- @code{$7}, is @samp{example.}. Because there is no space between the
- @samp{e} and the @samp{.}, the period is considered part of the seventh
- field.@refill
-
- No matter how many fields there are, the last field in a record can be
- represented by @code{$NF}. So, in the example above, @code{$NF} would
- be the same as @code{$7}, which is @samp{example.}. Why this works is
- explained below (@pxref{Non-Constant Fields}). If you try to refer to a
- field beyond the last one, such as @code{$8} when the record has only 7
- fields, you get the empty string.
-
- @vindex NF
- @cindex number of fields, @code{NF}
- Plain @code{NF}, with no @samp{$}, is a built-in variable whose value
- is the number of fields in the current record.
-
- @code{$0}, which looks like an attempt to refer to the zeroth field, is
- a special case: it represents the whole input record. This is what you
- would use when you aren't interested in fields.
-
- Here are some more examples:
-
- @example
- awk '$1 ~ /foo/ @{ print $0 @}' BBS-list
- @end example
-
- @noindent
- This example prints each record in the file @file{BBS-list} whose first
- field contains the string @samp{foo}. The operator @samp{~} is called a
- @dfn{matching operator} (@pxref{Comparison Ops}); it tests whether a
- string (here, the field @code{$1}) contains a match for a given regular
- expression.@refill
-
- By contrast, the following example:
-
- @example
- awk '/foo/ @{ print $1, $NF @}' BBS-list
- @end example
-
- @noindent
- looks for @samp{foo} in @emph{the entire record} and prints the first
- field and the last field for each input record containing a
- match.@refill
-
- @node Non-Constant Fields, Changing Fields, Fields, Reading Files
- @section Non-constant Field Numbers
-
- The number of a field does not need to be a constant. Any expression in
- the @code{awk} language can be used after a @samp{$} to refer to a
- field. The value of the expression specifies the field number. If the
- value is a string, rather than a number, it is converted to a number.
- Consider this example:@refill
-
- @example
- awk '@{ print $NR @}'
- @end example
-
- @noindent
- Recall that @code{NR} is the number of records read so far: 1 in the
- first record, 2 in the second, etc. So this example prints the first
- field of the first record, the second field of the second record, and so
- on. For the twentieth record, field number 20 is printed; most likely,
- the record has fewer than 20 fields, so this prints a blank line.
-
- Here is another example of using expressions as field numbers:
-
- @example
- awk '@{ print $(2*2) @}' BBS-list
- @end example
-
- The @code{awk} language must evaluate the expression @code{(2*2)} and use
- its value as the number of the field to print. The @samp{*} sign
- represents multiplication, so the expression @code{2*2} evaluates to 4.
- The parentheses are used so that the multiplication is done before the
- @samp{$} operation; they are necessary whenever there is a binary
- operator in the field-number expression. This example, then, prints the
- hours of operation (the fourth field) for every line of the file
- @file{BBS-list}.@refill
-
- If the field number you compute is zero, you get the entire record.
- Thus, @code{$(2-2)} has the same value as @code{$0}. Negative field
- numbers are not allowed.
-
- The number of fields in the current record is stored in the built-in
- variable @code{NF} (@pxref{Built-in Variables}). The expression
- @code{$NF} is not a special feature: it is the direct consequence of
- evaluating @code{NF} and using its value as a field number.
-
- @node Changing Fields, Field Separators, Non-Constant Fields, Reading Files
- @section Changing the Contents of a Field
-
- @cindex field, changing contents of
- @cindex changing contents of a field
- @cindex assignment to fields
- You can change the contents of a field as seen by @code{awk} within an
- @code{awk} program; this changes what @code{awk} perceives as the
- current input record. (The actual input is untouched: @code{awk} never
- modifies the input file.)
-
- Look at this example:
-
- @example
- awk '@{ $3 = $2 - 10; print $2, $3 @}' inventory-shipped
- @end example
-
- @noindent
- The @samp{-} sign represents subtraction, so this program reassigns
- field three, @code{$3}, to be the value of field two minus ten,
- @code{$2 - 10}. (@xref{Arithmetic Ops}.) Then field two, and the
- new value for field three, are printed.
-
- In order for this to work, the text in field @code{$2} must make sense
- as a number; the string of characters must be converted to a number in
- order for the computer to do arithmetic on it. The number resulting
- from the subtraction is converted back to a string of characters which
- then becomes field three. @xref{Conversion}.
-
- When you change the value of a field (as perceived by @code{awk}), the
- text of the input record is recalculated to contain the new field where
- the old one was. Therefore, @code{$0} changes to reflect the altered
- field. Thus,
-
- @example
- awk '@{ $2 = $2 - 10; print $0 @}' inventory-shipped
- @end example
-
- @noindent
- prints a copy of the input file, with 10 subtracted from the second
- field of each line.
-
- You can also assign contents to fields that are out of range. For
- example:
-
- @example
- awk '@{ $6 = ($5 + $4 + $3 + $2) ; print $6 @}' inventory-shipped
- @end example
-
- @noindent
- We've just created @code{$6}, whose value is the sum of fields
- @code{$2}, @code{$3}, @code{$4}, and @code{$5}. The @samp{+} sign
- represents addition. For the file @file{inventory-shipped}, @code{$6}
- represents the total number of parcels shipped for a particular month.
-
- Creating a new field changes the internal @code{awk} copy of the current
- input record---the value of @code{$0}. Thus, if you do @samp{print $0}
- after adding a field, the record printed includes the new field, with
- the appropriate number of field separators between it and the previously
- existing fields.
-
- This recomputation affects and is affected by several features not yet
- discussed, in particular, the @dfn{output field separator}, @code{OFS},
- which is used to separate the fields (@pxref{Output Separators}), and
- @code{NF} (the number of fields; @pxref{Fields}). For example, the
- value of @code{NF} is set to the number of the highest field you
- create.@refill
-
- Note, however, that merely @emph{referencing} an out-of-range field
- does @emph{not} change the value of either @code{$0} or @code{NF}.
- Referencing an out-of-range field merely produces a null string. For
- example:@refill
-
- @example
- if ($(NF+1) != "")
- print "can't happen"
- else
- print "everything is normal"
- @end example
-
- @noindent
- should print @samp{everything is normal}, because @code{NF+1} is certain
- to be out of range. (@xref{If Statement}, for more information about
- @code{awk}'s @code{if-else} statements.)
-
- @node Field Separators, Multiple Line, Changing Fields, Reading Files
- @section Specifying How Fields Are Separated
- @vindex FS
- @cindex fields, separating
- @cindex field separator, @code{FS}
- @cindex @samp{-F} option
-
- The way @code{awk} splits an input record into fields is controlled by
- the @dfn{field separator}, which is a single character or a regular
- expression. @code{awk} scans the input record for matches for the
- separator; the fields themselves are the text between the matches. For
- example, if the field separator is @samp{oo}, then the following line:
-
- @example
- moo goo gai pan
- @end example
-
- @noindent
- would be split into three fields: @samp{m}, @samp{@ g} and @samp{@ gai@
- pan}.
-
- The field separator is represented by the built-in variable @code{FS}.
- Shell programmers take note! @code{awk} does not use the name
- @code{IFS} which is used by the shell.@refill
-
- You can change the value of @code{FS} in the @code{awk} program with the
- assignment operator, @samp{=} (@pxref{Assignment Ops}). Often the right
- time to do this is at the beginning of execution, before any input has
- been processed, so that the very first record will be read with the
- proper separator. To do this, use the special @code{BEGIN} pattern
- (@pxref{BEGIN/END}). For example, here we set the value of @code{FS} to
- the string @code{","}:
-
- @example
- awk 'BEGIN @{ FS = "," @} ; @{ print $2 @}'
- @end example
-
- @noindent
- Given the input line,
-
- @example
- John Q. Smith, 29 Oak St., Walamazoo, MI 42139
- @end example
-
- @noindent
- this @code{awk} program extracts the string @samp{29 Oak St.}.
-
- @cindex field separator, choice of
- @cindex regular expressions as field separators
- Sometimes your input data will contain separator characters that don't
- separate fields the way you thought they would. For instance, the
- person's name in the example we've been using might have a title or
- suffix attached, such as @samp{John Q. Smith, LXIX}. From input
- containing such a name:
-
- @example
- John Q. Smith, LXIX, 29 Oak St., Walamazoo, MI 42139
- @end example
-
- @noindent
- the previous sample program would extract @samp{LXIX}, instead of
- @samp{29 Oak St.}. If you were expecting the program to print the
- address, you would be surprised. So choose your data layout and
- separator characters carefully to prevent such problems.
-
- As you know, by default, fields are separated by whitespace sequences
- (spaces and tabs), not by single spaces: two spaces in a row do not
- delimit an empty field. The default value of the field separator is a
- string @w{@code{" "}} containing a single space. If this value were
- interpreted in the usual way, each space character would separate
- fields, so two spaces in a row would make an empty field between them.
- The reason this does not happen is that a single space as the value of
- @code{FS} is a special case: it is taken to specify the default manner
- of delimiting fields.
-
- If @code{FS} is any other single character, such as @code{","}, then
- each occurrence of that character separates two fields. Two consecutive
- occurrences delimit an empty field. If the character occurs at the
- beginning or the end of the line, that too delimits an empty field. The
- space character is the only single character which does not follow these
- rules.
-
- More generally, the value of @code{FS} may be a string containing any
- regular expression. Then each match in the record for the regular
- expression separates fields. For example, the assignment:@refill
-
- @example
- FS = ", \t"
- @end example
-
- @noindent
- makes every area of an input line that consists of a comma followed by a
- space and a tab, into a field separator. (@samp{\t} stands for a
- tab.)@refill
-
- For a less trivial example of a regular expression, suppose you want
- single spaces to separate fields the way single commas were used above.
- You can set @code{FS} to @w{@code{"[@ ]"}}. This regular expression
- matches a single space and nothing else.
-
- @cindex field separator, setting on command line
- @cindex command line, setting @code{FS} on
- @code{FS} can be set on the command line. You use the @samp{-F} argument to
- do so. For example:
-
- @example
- awk -F, '@var{program}' @var{input-files}
- @end example
-
- @noindent
- sets @code{FS} to be the @samp{,} character. Notice that the argument uses
- a capital @samp{F}. Contrast this with @samp{-f}, which specifies a file
- containing an @code{awk} program. Case is significant in command options:
- the @samp{-F} and @samp{-f} options have nothing to do with each other.
- You can use both options at the same time to set the @code{FS} argument
- @emph{and} get an @code{awk} program from a file.
-
- As a special case, in compatibility mode (@pxref{Command Line}), if the
- argument to @samp{-F} is @samp{t}, then @code{FS} is set to the tab
- character. (This is because if you type @samp{-F\t}, without the quotes,
- at the shell, the @samp{\} gets deleted, so @code{awk} figures that you
- really want your fields to be separated with tabs, and not @samp{t}s.
- Use @samp{FS="t"} on the command line if you really do want to separate
- your fields with @samp{t}s.)
-
- For example, let's use an @code{awk} program file called @file{baud.awk}
- that contains the pattern @code{/300/}, and the action @samp{print $1}.
- Here is the program:
-
- @example
- /300/ @{ print $1 @}
- @end example
-
- Let's also set @code{FS} to be the @samp{-} character, and run the
- program on the file @file{BBS-list}. The following command prints a
- list of the names of the bulletin boards that operate at 300 baud and
- the first three digits of their phone numbers:@refill
-
- @example
- awk -F- -f baud.awk BBS-list
- @end example
-
- @noindent
- It produces this output:
-
- @example
- aardvark 555
- alpo
- barfly 555
- bites 555
- camelot 555
- core 555
- fooey 555
- foot 555
- macfoo 555
- sdace 555
- sabafoo 555
- @end example
-
- @noindent
- Note the second line of output. If you check the original file, you will
- see that the second line looked like this:
-
- @example
- alpo-net 555-3412 2400/1200/300 A
- @end example
-
- The @samp{-} as part of the system's name was used as the field
- separator, instead of the @samp{-} in the phone number that was
- originally intended. This demonstrates why you have to be careful in
- choosing your field and record separators.
-
- The following program searches the system password file, and prints
- the entries for users who have no password:
-
- @example
- awk -F: '$2 == ""' /etc/passwd
- @end example
-
- @noindent
- Here we use the @samp{-F} option on the command line to set the field
- separator. Note that fields in @file{/etc/passwd} are separated by
- colons. The second field represents a user's encrypted password, but if
- the field is empty, that user has no password.
-
- @node Multiple Line, Getline, Field Separators, Reading Files
- @section Multiple-Line Records
-
- @cindex multiple line records
- @cindex input, multiple line records
- @cindex reading files, multiple line records
- @cindex records, multiple line
- In some data bases, a single line cannot conveniently hold all the
- information in one entry. In such cases, you can use multi-line
- records.
-
- The first step in doing this is to choose your data format: when records
- are not defined as single lines, how do you want to define them?
- What should separate records?
-
- One technique is to use an unusual character or string to separate
- records. For example, you could use the formfeed character (written
- @samp{\f} in @code{awk}, as in C) to separate them, making each record
- a page of the file. To do this, just set the variable @code{RS} to
- @code{"\f"} (a string containing the formfeed character). Any
- other character could equally well be used, as long as it won't be part
- of the data in a record.
-
- @ignore
- Another technique is to have blank lines separate records. The string
- @code{"^\n+"} is a regular expression that matches any sequence of
- newlines starting at the beginning of a line---in other words, it
- matches a sequence of blank lines. If you set @code{RS} to this string,
- a record always ends at the first blank line encountered. In
- addition, a regular expression always matches the longest possible
- sequence when there is a choice. So the next record doesn't start until
- the first nonblank line that follows---no matter how many blank lines
- appear in a row, they are considered one record-separator.
- @end ignore
-
- Another technique is to have blank lines separate records. By a special
- dispensation, a null string as the value of @code{RS} indicates that
- records are separated by one or more blank lines. If you set @code{RS}
- to the null string, a record always ends at the first blank line
- encountered. And the next record doesn't start until the first nonblank
- line that follows---no matter how many blank lines appear in a row, they
- are considered one record-separator.
-
- The second step is to separate the fields in the record. One way to do
- this is to put each field on a separate line: to do this, just set the
- variable @code{FS} to the string @code{"\n"}. (This simple regular
- expression matches a single newline.)
-
- Another idea is to divide each of the lines into fields in the normal
- manner. This happens by default as a result of a special feature: when
- @code{RS} is set to the null string, the newline character @emph{always}
- acts as a field separator. This is in addition to whatever field
- separations result from @code{FS}.
-
- The original motivation for this special exception was probably so that
- you get useful behavior in the default case (i.e., @w{@code{FS == "
- "}}). This feature can be a problem if you really don't want the
- newline character to separate fields, since there is no way to
- prevent it. However, you can work around this by using the @code{split}
- function to break up the record manually (@pxref{String Functions}).
-
- @ignore
- Here are two ways to use records separated by blank lines and break each
- line into fields normally:
-
- @example
- awk 'BEGIN @{ RS = ""; FS = "[ \t\n]+" @} @{ print $1 @}' BBS-list
-
- @exdent @r{or}
-
- awk 'BEGIN @{ RS = "^\n+"; FS = "[ \t\n]+" @} @{ print $1 @}' BBS-list
- @end example
- @end ignore
-
- @ignore
- Here is how to use records separated by blank lines and break each
- line into fields normally:
-
- @example
- awk 'BEGIN @{ RS = ""; FS = "[ \t\n]+" @} ; @{ print $1 @}' BBS-list
- @end example
- @end ignore
-
- @node Getline, Close Input, Multiple Line, Reading Files
- @section Explicit Input with @code{getline}
-
- @findex getline
- @cindex input, explicit
- @cindex explicit input
- @cindex input, @code{getline} command
- @cindex reading files, @code{getline} command
- So far we have been getting our input files from @code{awk}'s main
- input stream---either the standard input (usually your terminal) or the
- files specified on the command line. The @code{awk} language has a
- special built-in command called @code{getline} that
- can be used to read input under your explicit control.
-
- This command is quite complex and should @emph{not} be used by
- beginners. It is covered here because this is the chapter on input.
- The examples that follow the explanation of the @code{getline} command
- include material that has not been covered yet. Therefore, come back
- and study the @code{getline} command @emph{after} you have reviewed the
- rest of this manual and have a good knowledge of how @code{awk} works.
-
- @code{getline} returns 1 if it finds a record, and 0 if the end of the
- file is encountered. If there is some error in getting a record, such
- as a file that cannot be opened, then @code{getline} returns @minus{}1.
-
- In the following examples, @var{command} stands for a string value that
- represents a shell command.
-
- @table @code
- @item getline
- The @code{getline} command can be used without arguments to read input
- from the current input file. All it does in this case is read the next
- input record and split it up into fields. This is useful if you've
- finished processing the current record, but you want to do some special
- processing @emph{right now} on the next record. Here's an
- example:@refill
-
- @example
- awk '@{
- if (t = index($0, "/*")) @{
- if(t > 1)
- tmp = substr($0, 1, t - 1)
- else
- tmp = ""
- u = index(substr($0, t + 2), "*/")
- while (! u) @{
- getline
- t = -1
- u = index($0, "*/")
- @}
- if(u <= length($0) - 2)
- $0 = tmp substr($0, t + u + 3)
- else
- $0 = tmp
- @}
- print $0
- @}'
- @end example
-
- This @code{awk} program deletes all comments, @samp{/* @dots{}
- */}, from the input. By replacing the @samp{print $0} with other
- statements, you could perform more complicated processing on the
- decommented input, such as searching it for matches for a regular
- expression.
-
- This form of the @code{getline} command sets @code{NF} (the number of
- fields; @pxref{Fields}), @code{NR} (the number of records read so far;
- @pxref{Records}), @code{FNR} (the number of records read from this input
- file), and the value of @code{$0}.
-
- @strong{Note:} the new value of @code{$0} is used in testing
- the patterns of any subsequent rules. The original value
- of @code{$0} that triggered the rule which executed @code{getline}
- is lost. By contrast, the @code{next} statement reads a new record
- but immediately begins processing it normally, starting with the first
- rule in the program. @xref{Next Statement}.
-
- @item getline @var{var}
- This form of @code{getline} reads a record into the variable @var{var}.
- This is useful when you want your program to read the next record from
- the current input file, but you don't want to subject the record to the
- normal input processing.
-
- For example, suppose the next line is a comment, or a special string,
- and you want to read it, but you must make certain that it won't trigger
- any rules. This version of @code{getline} allows you to read that line
- and store it in a variable so that the main
- read-a-line-and-check-each-rule loop of @code{awk} never sees it.
-
- The following example swaps every two lines of input. For example, given:
-
- @example
- wan
- tew
- free
- phore
- @end example
-
- @noindent
- it outputs:
-
- @example
- tew
- wan
- phore
- free
- @end example
-
- @noindent
- Here's the program:
-
- @example
- awk '@{
- if ((getline tmp) > 0) @{
- print tmp
- print $0
- @} else
- print $0
- @}'
- @end example
-
- The @code{getline} function used in this way sets only the variables
- @code{NR} and @code{FNR} (and of course, @var{var}). The record is not
- split into fields, so the values of the fields (including @code{$0}) and
- the value of @code{NF} do not change.@refill
-
- @item getline < @var{file}
- @cindex input redirection
- @cindex redirection of input
- This form of the @code{getline} function takes its input from the file
- @var{file}. Here @var{file} is a string-valued expression that
- specifies the file name. @samp{< @var{file}} is called a @dfn{redirection}
- since it directs input to come from a different place.
-
- This form is useful if you want to read your input from a particular
- file, instead of from the main input stream. For example, the following
- program reads its input record from the file @file{foo.input} when it
- encounters a first field with a value equal to 10 in the current input
- file.@refill
-
- @example
- awk '@{
- if ($1 == 10) @{
- getline < "foo.input"
- print
- @} else
- print
- @}'
- @end example
-
- Since the main input stream is not used, the values of @code{NR} and
- @code{FNR} are not changed. But the record read is split into fields in
- the normal manner, so the values of @code{$0} and other fields are
- changed. So is the value of @code{NF}.
-
- This does not cause the record to be tested against all the patterns
- in the @code{awk} program, in the way that would happen if the record
- were read normally by the main processing loop of @code{awk}. However
- the new record is tested against any subsequent rules, just as when
- @code{getline} is used without a redirection.
-
- @item getline @var{var} < @var{file}
- This form of the @code{getline} function takes its input from the file
- @var{file} and puts it in the variable @var{var}. As above, @var{file}
- is a string-valued expression that specifies the file to read from.
-
- In this version of @code{getline}, none of the built-in variables are
- changed, and the record is not split into fields. The only variable
- changed is @var{var}.
-
- For example, the following program copies all the input files to the
- output, except for records that say @w{@samp{@@include @var{filename}}}.
- Such a record is replaced by the contents of the file
- @var{filename}.@refill
-
- @example
- awk '@{
- if (NF == 2 && $1 == "@@include") @{
- while ((getline line < $2) > 0)
- print line
- close($2)
- @} else
- print
- @}'
- @end example
-
- Note here how the name of the extra input file is not built into
- the program; it is taken from the data, from the second field on
- the @samp{@@include} line.
-
- The @code{close} function is called to ensure that if two identical
- @samp{@@include} lines appear in the input, the entire specified file is
- included twice. @xref{Close Input}.
-
- One deficiency of this program is that it does not process nested
- @samp{@@include} statements the way a true macro preprocessor would.
-
- @item @var{command} | getline
- You can @dfn{pipe} the output of a command into @code{getline}. A pipe is
- simply a way to link the output of one program to the input of another. In
- this case, the string @var{command} is run as a shell command and its output
- is piped into @code{awk} to be used as input. This form of @code{getline}
- reads one record from the pipe.
-
- For example, the following program copies input to output, except for lines
- that begin with @samp{@@execute}, which are replaced by the output produced by
- running the rest of the line as a shell command:
-
- @example
- awk '@{
- if ($1 == "@@execute") @{
- tmp = substr($0, 10)
- while ((tmp | getline) > 0)
- print
- close(tmp)
- @} else
- print
- @}'
- @end example
-
- @noindent
- The @code{close} function is called to ensure that if two identical
- @samp{@@execute} lines appear in the input, the command is run again for
- each one. @xref{Close Input}.
-
- Given the input:
-
- @example
- foo
- bar
- baz
- @@execute who
- bletch
- @end example
-
- @noindent
- the program might produce:
-
- @example
- foo
- bar
- baz
- hack ttyv0 Jul 13 14:22
- hack ttyp0 Jul 13 14:23 (gnu:0)
- hack ttyp1 Jul 13 14:23 (gnu:0)
- hack ttyp2 Jul 13 14:23 (gnu:0)
- hack ttyp3 Jul 13 14:23 (gnu:0)
- bletch
- @end example
-
- @noindent
- Notice that this program ran the command @code{who} and printed the result.
- (If you try this program yourself, you will get different results, showing
- you logged in.)
-
- This variation of @code{getline} splits the record into fields, sets the
- value of @code{NF} and recomputes the value of @code{$0}. The values of
- @code{NR} and @code{FNR} are not changed.
-
- @item @var{command} | getline @var{var}
- The output of the command @var{command} is sent through a pipe to
- @code{getline} and into the variable @var{var}. For example, the
- following program reads the current date and time into the variable
- @code{current_time}, using the utility called @code{date}, and then
- prints it.@refill
-
- @group
- @example
- awk 'BEGIN @{
- "date" | getline current_time
- close("date")
- print "Report printed on " current_time
- @}'
- @end example
- @end group
-
- In this version of @code{getline}, none of the built-in variables are
- changed, and the record is not split into fields.
- @end table
-
- @node Close Input,, Getline, Reading Files
- @section Closing Input Files and Pipes
- @cindex closing input files and pipes
- @findex close
-
- If the same file name or the same shell command is used with
- @code{getline} more than once during the execution of an @code{awk}
- program, the file is opened (or the command is executed) only the first time.
- At that time, the first record of input is read from that file or command.
- The next time the same file or command is used in @code{getline}, another
- record is read from it, and so on.
-
- This implies that if you want to start reading the same file again from
- the beginning, or if you want to rerun a shell command (rather that
- reading more output from the command), you must take special steps.
- What you can do is use the @code{close} function, as follows:
-
- @example
- close(@var{filename})
- @end example
-
- @noindent
- or
-
- @example
- close(@var{command})
- @end example
-
- The argument @var{filename} or @var{command} can be any expression. Its
- value must exactly equal the string that was used to open the file or
- start the command---for example, if you open a pipe with this:
-
- @example
- "sort -r names" | getline foo
- @end example
-
- @noindent
- then you must close it with this:
-
- @example
- close("sort -r names")
- @end example
-
- Once this function call is executed, the next @code{getline} from that
- file or command will reopen the file or rerun the command.
-
- @node Printing, One-liners, Reading Files, Top
- @chapter Printing Output
-
- @cindex printing
- @cindex output
- One of the most common things that actions do is to output or @dfn{print}
- some or all of the input. For simple output, use the @code{print}
- statement. For fancier formatting use the @code{printf} statement.
- Both are described in this chapter.
-
- @menu
- * Print:: The @code{print} statement.
- * Print Examples:: Simple examples of @code{print} statements.
- * Output Separators:: The output separators and how to change them.
- * Printf:: The @code{printf} statement.
- * Redirection:: How to redirect output to multiple files and pipes.
- * Special Files:: File name interpretation in @code{gawk}. @code{gawk}
- allows access to inherited file descriptors.
- @end menu
-
- @node Print, Print Examples, Printing, Printing
- @section The @code{print} Statement
- @cindex @code{print} statement
-
- The @code{print} statement does output with simple, standardized
- formatting. You specify only the strings or numbers to be printed, in a
- list separated by commas. They are output, separated by single spaces,
- followed by a newline. The statement looks like this:
-
- @example
- print @var{item1}, @var{item2}, @dots{}
- @end example
-
- @noindent
- The entire list of items may optionally be enclosed in parentheses. The
- parentheses are necessary if any of the item expressions uses a
- relational operator; otherwise it could be confused with a redirection
- (@pxref{Redirection}). The relational operators are @samp{==},
- @samp{!=}, @samp{<}, @samp{>}, @samp{>=}, @samp{<=}, @samp{~} and
- @samp{!~} (@pxref{Comparison Ops}).@refill
-
- The items printed can be constant strings or numbers, fields of the
- current record (such as @code{$1}), variables, or any @code{awk}
- expressions. The @code{print} statement is completely general for
- computing @emph{what} values to print. With one exception
- (@pxref{Output Separators}), what you can't do is specify @emph{how} to
- print them---how many columns to use, whether to use exponential
- notation or not, and so on. For that, you need the @code{printf}
- statement (@pxref{Printf}).
-
- The simple statement @samp{print} with no items is equivalent to
- @samp{print $0}: it prints the entire current record. To print a blank
- line, use @samp{print ""}, where @code{""} is the null, or empty,
- string.
-
- To print a fixed piece of text, use a string constant such as
- @w{@code{"Hello there"}} as one item. If you forget to use the
- double-quote characters, your text will be taken as an @code{awk}
- expression, and you will probably get an error. Keep in mind that a
- space is printed between any two items.
-
- Most often, each @code{print} statement makes one line of output. But it
- isn't limited to one line. If an item value is a string that contains a
- newline, the newline is output along with the rest of the string. A
- single @code{print} can make any number of lines this way.
-
- @node Print Examples, Output Separators, Print, Printing
- @section Examples of @code{print} Statements
-
- Here is an example of printing a string that contains embedded newlines:
-
- @example
- awk 'BEGIN @{ print "line one\nline two\nline three" @}'
- @end example
-
- @noindent
- produces output like this:
-
- @example
- line one
- line two
- line three
- @end example
-
- Here is an example that prints the first two fields of each input record,
- with a space between them:
-
- @example
- awk '@{ print $1, $2 @}' inventory-shipped
- @end example
-
- @noindent
- Its output looks like this:
-
- @example
- Jan 13
- Feb 15
- Mar 15
- @dots{}
- @end example
-
- A common mistake in using the @code{print} statement is to omit the comma
- between two items. This often has the effect of making the items run
- together in the output, with no space. The reason for this is that
- juxtaposing two string expressions in @code{awk} means to concatenate
- them. For example, without the comma:
-
- @example
- awk '@{ print $1 $2 @}' inventory-shipped
- @end example
-
- @noindent
- prints:
-
- @example
- Jan13
- Feb15
- Mar15
- @dots{}
- @end example
-
- Neither example's output makes much sense to someone unfamiliar with the
- file @file{inventory-shipped}. A heading line at the beginning would make
- it clearer. Let's add some headings to our table of months (@code{$1}) and
- green crates shipped (@code{$2}). We do this using the @code{BEGIN} pattern
- (@pxref{BEGIN/END}) to cause the headings to be printed only once:
-
- @c the formatting is strange here because the @{ becomes just a brace.
- @example
- awk 'BEGIN @{ print "Month Crates"
- print "----- ------" @}
- @{ print $1, $2 @}' inventory-shipped
- @end example
-
- @noindent
- Did you already guess what happens? This program prints the following:
-
- @group
- @example
- Month Crates
- ----- ------
- Jan 13
- Feb 15
- Mar 15
- @dots{}
- @end example
- @end group
-
- @noindent
- The headings and the table data don't line up! We can fix this by printing
- some spaces between the two fields:
-
- @example
- awk 'BEGIN @{ print "Month Crates"
- print "----- ------" @}
- @{ print $1, " ", $2 @}' inventory-shipped
- @end example
-
- You can imagine that this way of lining up columns can get pretty
- complicated when you have many columns to fix. Counting spaces for two
- or three columns can be simple, but more than this and you can get
- ``lost'' quite easily. This is why the @code{printf} statement was
- created (@pxref{Printf}); one of its specialties is lining up columns of
- data.
-
- @node Output Separators, Printf, Print Examples, Printing
- @section Output Separators
-
- @cindex output field separator, @code{OFS}
- @vindex OFS
- @vindex ORS
- @cindex output record separator, @code{ORS}
- As mentioned previously, a @code{print} statement contains a list
- of items, separated by commas. In the output, the items are normally
- separated by single spaces. But they do not have to be spaces; a
- single space is only the default. You can specify any string of
- characters to use as the @dfn{output field separator} by setting the
- built-in variable @code{OFS}. The initial value of this variable
- is the string @w{@code{" "}}.
-
- The output from an entire @code{print} statement is called an
- @dfn{output record}. Each @code{print} statement outputs one output
- record and then outputs a string called the @dfn{output record separator}.
- The built-in variable @code{ORS} specifies this string. The initial
- value of the variable is the string @code{"\n"} containing a newline
- character; thus, normally each @code{print} statement makes a separate line.
-
- You can change how output fields and records are separated by assigning
- new values to the variables @code{OFS} and/or @code{ORS}. The usual
- place to do this is in the @code{BEGIN} rule (@pxref{BEGIN/END}), so
- that it happens before any input is processed. You may also do this
- with assignments on the command line, before the names of your input
- files.
-
- The following example prints the first and second fields of each input
- record separated by a semicolon, with a blank line added after each
- line:@refill
-
- @example
- awk 'BEGIN @{ OFS = ";"; ORS = "\n\n" @}
- @{ print $1, $2 @}' BBS-list
- @end example
-
- If the value of @code{ORS} does not contain a newline, all your output
- will be run together on a single line, unless you output newlines some
- other way.
-
- @node Printf, Redirection, Output Separators, Printing
- @section Using @code{printf} Statements For Fancier Printing
- @cindex formatted output
- @cindex output, formatted
-
- If you want more precise control over the output format than
- @code{print} gives you, use @code{printf}. With @code{printf} you can
- specify the width to use for each item, and you can specify various
- stylistic choices for numbers (such as what radix to use, whether to
- print an exponent, whether to print a sign, and how many digits to print
- after the decimal point). You do this by specifying a string, called
- the @dfn{format string}, which controls how and where to print the other
- arguments.
-
- @menu
- * Basic Printf:: Syntax of the @code{printf} statement.
- * Control Letters:: Format-control letters.
- * Format Modifiers:: Format-specification modifiers.
- * Printf Examples:: Several examples.
- @end menu
-
- @node Basic Printf, Control Letters, Printf, Printf
- @subsection Introduction to the @code{printf} Statement
-
- @cindex @code{printf} statement, syntax of
- The @code{printf} statement looks like this:@refill
-
- @example
- printf @var{format}, @var{item1}, @var{item2}, @dots{}
- @end example
-
- @noindent
- The entire list of items may optionally be enclosed in parentheses. The
- parentheses are necessary if any of the item expressions uses a
- relational operator; otherwise it could be confused with a redirection
- (@pxref{Redirection}). The relational operators are @samp{==},
- @samp{!=}, @samp{<}, @samp{>}, @samp{>=}, @samp{<=}, @samp{~} and
- @samp{!~} (@pxref{Comparison Ops}).@refill
-
- @cindex format string
- The difference between @code{printf} and @code{print} is the argument
- @var{format}. This is an expression whose value is taken as a string; its
- job is to say how to output each of the other arguments. It is called
- the @dfn{format string}.
-
- The format string is essentially the same as in the C library function
- @code{printf}. Most of @var{format} is text to be output verbatim.
- Scattered among this text are @dfn{format specifiers}, one per item.
- Each format specifier says to output the next item at that place in the
- format.@refill
-
- The @code{printf} statement does not automatically append a newline to its
- output. It outputs nothing but what the format specifies. So if you want
- a newline, you must include one in the format. The output separator
- variables @code{OFS} and @code{ORS} have no effect on @code{printf}
- statements.
-
- @node Control Letters, Format Modifiers, Basic Printf, Printf
- @subsection Format-Control Letters
- @cindex @code{printf}, format-control characters
- @cindex format specifier
-
- A format specifier starts with the character @samp{%} and ends with a
- @dfn{format-control letter}; it tells the @code{printf} statement how
- to output one item. (If you actually want to output a @samp{%}, write
- @samp{%%}.) The format-control letter specifies what kind of value to
- print. The rest of the format specifier is made up of optional
- @dfn{modifiers} which are parameters such as the field width to use.
-
- Here is a list of the format-control letters:
-
- @table @samp
- @item c
- This prints a number as an ASCII character. Thus, @samp{printf "%c",
- 65} outputs the letter @samp{A}. The output for a string value is
- the first character of the string.
-
- @item d
- This prints a decimal integer.
-
- @item i
- This also prints a decimal integer.
-
- @item e
- This prints a number in scientific (exponential) notation.
- For example,
-
- @example
- printf "%4.3e", 1950
- @end example
-
- @noindent
- prints @samp{1.950e+03}, with a total of 4 significant figures of
- which 3 follow the decimal point. The @samp{4.3} are @dfn{modifiers},
- discussed below.
-
- @item f
- This prints a number in floating point notation.
-
- @item g
- This prints either scientific notation or floating point notation, whichever
- is shorter.
-
- @item o
- This prints an unsigned octal integer.
-
- @item s
- This prints a string.
-
- @item x
- This prints an unsigned hexadecimal integer.
-
- @item X
- This prints an unsigned hexadecimal integer. However, for the values 10
- through 15, it uses the letters @samp{A} through @samp{F} instead of
- @samp{a} through @samp{f}.
-
- @item %
- This isn't really a format-control letter, but it does have a meaning
- when used after a @samp{%}: the sequence @samp{%%} outputs one
- @samp{%}. It does not consume an argument.
- @end table
-
- @node Format Modifiers, Printf Examples, Control Letters, Printf
- @subsection Modifiers for @code{printf} Formats
-
- @cindex @code{printf}, modifiers
- @cindex modifiers (in format specifiers)
- A format specification can also include @dfn{modifiers} that can control
- how much of the item's value is printed and how much space it gets. The
- modifiers come between the @samp{%} and the format-control letter. Here
- are the possible modifiers, in the order in which they may appear:
-
- @table @samp
- @item -
- The minus sign, used before the width modifier, says to left-justify
- the argument within its specified width. Normally the argument
- is printed right-justified in the specified width. Thus,
-
- @example
- printf "%-4s", "foo"
- @end example
-
- @noindent
- prints @samp{foo }.
-
- @item @var{width}
- This is a number representing the desired width of a field. Inserting any
- number between the @samp{%} sign and the format control character forces the
- field to be expanded to this width. The default way to do this is to
- pad with spaces on the left. For example,
-
- @example
- printf "%4s", "foo"
- @end example
-
- @noindent
- prints @samp{ foo}.
-
- The value of @var{width} is a minimum width, not a maximum. If the item
- value requires more than @var{width} characters, it can be as wide as
- necessary. Thus,
-
- @example
- printf "%4s", "foobar"
- @end example
-
- @noindent
- prints @samp{foobar}. Preceding the @var{width} with a minus sign causes
- the output to be padded with spaces on the right, instead of on the left.
-
- @item .@var{prec}
- This is a number that specifies the precision to use when printing.
- This specifies the number of digits you want printed to the right of the
- decimal point. For a string, it specifies the maximum number of
- characters from the string that should be printed.
- @end table
-
- The C library @code{printf}'s dynamic @var{width} and @var{prec}
- capability (for example, @code{"%*.*s"}) is not yet supported. However, it can
- easily be simulated using concatenation to dynamically build the
- format string.@refill
-
- @node Printf Examples, , Format Modifiers, Printf
- @subsection Examples of Using @code{printf}
-
- Here is how to use @code{printf} to make an aligned table:
-
- @example
- awk '@{ printf "%-10s %s\n", $1, $2 @}' BBS-list
- @end example
-
- @noindent
- prints the names of bulletin boards (@code{$1}) of the file
- @file{BBS-list} as a string of 10 characters, left justified. It also
- prints the phone numbers (@code{$2}) afterward on the line. This
- produces an aligned two-column table of names and phone numbers:
-
- @example
- aardvark 555-5553
- alpo-net 555-3412
- barfly 555-7685
- bites 555-1675
- camelot 555-0542
- core 555-2912
- fooey 555-1234
- foot 555-6699
- macfoo 555-6480
- sdace 555-3430
- sabafoo 555-2127
- @end example
-
- Did you notice that we did not specify that the phone numbers be printed
- as numbers? They had to be printed as strings because the numbers are
- separated by a dash. This dash would be interpreted as a minus sign if
- we had tried to print the phone numbers as numbers. This would have led
- to some pretty confusing results.
-
- We did not specify a width for the phone numbers because they are the
- last things on their lines. We don't need to put spaces after them.
-
- We could make our table look even nicer by adding headings to the tops
- of the columns. To do this, use the @code{BEGIN} pattern
- (@pxref{BEGIN/END}) to cause the header to be printed only once, at the
- beginning of the @code{awk} program:
-
- @example
- awk 'BEGIN @{ print "Name Number"
- print "---- ------" @}
- @{ printf "%-10s %s\n", $1, $2 @}' BBS-list
- @end example
-
- Did you notice that we mixed @code{print} and @code{printf} statements in
- the above example? We could have used just @code{printf} statements to get
- the same results:
-
- @example
- awk 'BEGIN @{ printf "%-10s %s\n", "Name", "Number"
- printf "%-10s %s\n", "----", "------" @}
- @{ printf "%-10s %s\n", $1, $2 @}' BBS-list
- @end example
-
- @noindent
- By outputting each column heading with the same format specification
- used for the elements of the column, we have made sure that the headings
- are aligned just like the columns.
-
- The fact that the same format specification is used three times can be
- emphasized by storing it in a variable, like this:
-
- @example
- awk 'BEGIN @{ format = "%-10s %s\n"
- printf format, "Name", "Number"
-