This is Info file gawk.info, produced by Makeinfo-1.55 from the input file /gnu-src/gawk-2.15.6/gawk.texi. This file documents `awk', a program that you can use to select particular records in a file and perform operations upon them. This is Edition 0.15 of `The GAWK Manual', for the 2.15 version of the GNU implementation of AWK. Copyright (C) 1989, 1991, 1992, 1993 Free Software Foundation, Inc. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Foundation. File: gawk.info, Node: V7/S5R3.1, Next: S5R4, Prev: Language History, Up: Language History Major Changes between V7 and S5R3.1 =================================== The `awk' language evolved considerably between the release of Version 7 Unix (1978) and the new version first made widely available in System V Release 3.1 (1987). This section summarizes the changes, with cross-references to further details. * The requirement for `;' to separate rules on a line (*note `awk' Statements versus Lines: Statements/Lines.). * User-defined functions, and the `return' statement (*note User-defined Functions: User-defined.). * The `delete' statement (*note The `delete' Statement: Delete.). * The `do'-`while' statement (*note The `do'-`while' Statement: Do Statement.). * The built-in functions `atan2', `cos', `sin', `rand' and `srand' (*note Numeric Built-in Functions: Numeric Functions.). * The built-in functions `gsub', `sub', and `match' (*note Built-in Functions for String Manipulation: String Functions.). * The built-in functions `close', which closes an open file, and `system', which allows the user to execute operating system commands (*note Built-in Functions for Input/Output: I/O Functions.). * The `ARGC', `ARGV', `FNR', `RLENGTH', `RSTART', and `SUBSEP' built-in variables (*note Built-in Variables::.). * The conditional expression using the operators `?' and `:' (*note Conditional Expressions: Conditional Exp.). * The exponentiation operator `^' (*note Arithmetic Operators: Arithmetic Ops.) and its assignment operator form `^=' (*note Assignment Expressions: Assignment Ops.). * C-compatible operator precedence, which breaks some old `awk' programs (*note Operator Precedence (How Operators Nest): Precedence.). * Regexps as the value of `FS' (*note Specifying how Fields are Separated: Field Separators.), and as the third argument to the `split' function (*note Built-in Functions for String Manipulation: String Functions.). * Dynamic regexps as operands of the `~' and `!~' operators (*note How to Use Regular Expressions: Regexp Usage.). * Escape sequences (*note Constant Expressions: Constants.) in regexps. * The escape sequences `\b', `\f', and `\r' (*note Constant Expressions: Constants.). * Redirection of input for the `getline' function (*note Explicit Input with `getline': Getline.). * Multiple `BEGIN' and `END' rules (*note `BEGIN' and `END' Special Patterns: BEGIN/END.). * Simulated multi-dimensional arrays (*note Multi-dimensional Arrays: Multi-dimensional.). File: gawk.info, Node: S5R4, Next: POSIX, Prev: V7/S5R3.1, Up: Language History Changes between S5R3.1 and S5R4 =============================== The System V Release 4 version of Unix `awk' added these features (some of which originated in `gawk'): * The `ENVIRON' variable (*note Built-in Variables::.). * Multiple `-f' options on the command line (*note Invoking `awk': Command Line.). * The `-v' option for assigning variables before program execution begins (*note Invoking `awk': Command Line.). * The `--' option for terminating command line options. * The `\a', `\v', and `\x' escape sequences (*note Constant Expressions: Constants.). * A defined return value for the `srand' built-in function (*note Numeric Built-in Functions: Numeric Functions.). * The `toupper' and `tolower' built-in string functions for case translation (*note Built-in Functions for String Manipulation: String Functions.). * A cleaner specification for the `%c' format-control letter in the `printf' function (*note Using `printf' Statements for Fancier Printing: Printf.). * The ability to dynamically pass the field width and precision (`"%*.*d"') in the argument list of the `printf' function (*note Using `printf' Statements for Fancier Printing: Printf.). * The use of constant regexps such as `/foo/' as expressions, where they are equivalent to use of the matching operator, as in `$0 ~ /foo/' (*note Constant Expressions: Constants.). File: gawk.info, Node: POSIX, Next: POSIX/GNU, Prev: S5R4, Up: Language History Changes between S5R4 and POSIX `awk' ==================================== The POSIX Command Language and Utilities standard for `awk' introduced the following changes into the language: * The use of `-W' for implementation-specific options. * The use of `CONVFMT' for controlling the conversion of numbers to strings (*note Conversion of Strings and Numbers: Conversion.). * The concept of a numeric string, and tighter comparison rules to go with it (*note Comparison Expressions: Comparison Ops.). * More complete documentation of many of the previously undocumented features of the language. File: gawk.info, Node: POSIX/GNU, Prev: POSIX, Up: Language History Extensions in `gawk' not in POSIX `awk' ======================================= The GNU implementation, `gawk', adds these features: * The `AWKPATH' environment variable for specifying a path search for the `-f' command line option (*note Invoking `awk': Command Line.). * The various `gawk' specific features available via the `-W' command line option (*note Invoking `awk': Command Line.). * The `ARGIND' variable, that tracks the movement of `FILENAME' through `ARGV'. (*note Built-in Variables::.). * The `ERRNO' variable, that contains the system error message when `getline' returns -1, or when `close' fails. (*note Built-in Variables::.). * The `IGNORECASE' variable and its effects (*note Case-sensitivity in Matching: Case-sensitivity.). * The `FIELDWIDTHS' variable and its effects (*note Reading Fixed-width Data: Constant Size.). * The `next file' statement for skipping to the next data file (*note The `next file' Statement: Next File Statement.). * The `systime' and `strftime' built-in functions for obtaining and printing time stamps (*note Functions for Dealing with Time Stamps: Time Functions.). * The `/dev/stdin', `/dev/stdout', `/dev/stderr', and `/dev/fd/N' file name interpretation (*note Standard I/O Streams: Special Files.). * The `-W compat' option to turn off these extensions (*note Invoking `awk': Command Line.). * The `-W posix' option for full POSIX compliance (*note Invoking `awk': Command Line.). File: gawk.info, Node: Installation, Next: Gawk Summary, Prev: Language History, Up: Top Installing `gawk' ***************** This chapter provides instructions for installing `gawk' on the various platforms that are supported by the developers. The primary developers support Unix (and one day, GNU), while the other ports were contributed. The file `ACKNOWLEDGMENT' in the `gawk' distribution lists the electronic mail addresses of the people who did the respective ports. * Menu: * Gawk Distribution:: What is in the `gawk' distribution. * Unix Installation:: Installing `gawk' under various versions of Unix. * VMS Installation:: Installing `gawk' on VMS. * MS-DOS Installation:: Installing `gawk' on MS-DOS. * Atari Installation:: Installing `gawk' on the Atari ST. File: gawk.info, Node: Gawk Distribution, Next: Unix Installation, Prev: Installation, Up: Installation The `gawk' Distribution ======================= This section first describes how to get and extract the `gawk' distribution, and then discusses what is in the various files and subdirectories. * Menu: * Extracting:: How to get and extract the distribution. * Distribution contents:: What is in the distribution. File: gawk.info, Node: Extracting, Next: Distribution contents, Prev: Gawk Distribution, Up: Gawk Distribution Getting the `gawk' Distribution ------------------------------- `gawk' is distributed as a `tar' file compressed with the GNU Zip program, `gzip'. You can get it via anonymous `ftp' to the Internet host `prep.ai.mit.edu'. Like all GNU software, it will be archived at other well known systems, from which it will be possible to use some sort of anonymous `uucp' to obtain the distribution as well. You can also order `gawk' on tape or CD-ROM directly from the Free Software Foundation. (The address is on the copyright page.) Doing so directly contributes to the support of the foundation and to the production of more free software. Once you have the distribution (for example, `gawk-2.15.0.tar.z'), first use `gzip' to expand the file, and then use `tar' to extract it. You can use the following pipeline to produce the `gawk' distribution: # Under System V, add 'o' to the tar flags gzip -d -c gawk-2.15.0.tar.z | tar -xvpf - This will create a directory named `gawk-2.15' in the current directory. The distribution file name is of the form `gawk-2.15.N.tar.Z'. The N represents a "patchlevel", meaning that minor bugs have been fixed in the major release. The current patchlevel is 0, but when retrieving distributions, you should get the version with the highest patchlevel. If you are not on a Unix system, you will need to make other arrangements for getting and extracting the `gawk' distribution. You should consult a local expert. File: gawk.info, Node: Distribution contents, Prev: Extracting, Up: Gawk Distribution Contents of the `gawk' Distribution ----------------------------------- `gawk' has a number of C source files, documentation files, subdirectories and files related to the configuration process (*note Compiling and Installing `gawk' on Unix: Unix Installation.), and several subdirectories related to different, non-Unix, operating systems. various `.c', `.y', and `.h' files The C and YACC source files are the actual `gawk' source code. `README' `README.VMS' `README.dos' `README.rs6000' `README.ultrix' Descriptive files: `README' for `gawk' under Unix, and the rest for the various hardware and software combinations. `PORTS' A list of systems to which `gawk' has been ported, and which have successfully run the test suite. `ACKNOWLEDGMENT' A list of the people who contributed major parts of the code or documentation. `NEWS' A list of changes to `gawk' since the last release or patch. `COPYING' The GNU General Public License. `FUTURES' A brief list of features and/or changes being contemplated for future releases, with some indication of the time frame for the feature, based on its difficulty. `LIMITATIONS' A list of those factors that limit `gawk''s performance. Most of these depend on the hardware or operating system software, and are not limits in `gawk' itself. `PROBLEMS' A file describing known problems with the current release. `gawk.1' The `troff' source for a manual page describing `gawk'. `gawk.texinfo' The `texinfo' source file for this Info file. It should be processed with TeX to produce a printed manual, and with `makeinfo' to produce the Info file. `Makefile.in' `config' `config.in' `configure' `missing' `mungeconf' These files and subdirectories are used when configuring `gawk' for various Unix systems. They are explained in detail in *Note Compiling and Installing `gawk' on Unix: Unix Installation. `atari' Files needed for building `gawk' on an Atari ST. *Note Installing `gawk' on the Atari ST: Atari Installation, for details. Files needed for building `gawk' under MS-DOS. *Note Installing `gawk' on MS-DOS: MS-DOS Installation, for details. `vms' Files needed for building `gawk' under VMS. *Note Compiling Installing and Running `gawk' on VMS: VMS Installation, for details. `test' Many interesting `awk' programs, provided as a test suite for `gawk'. You can use `make test' from the top level `gawk' directory to run your version of `gawk' against the test suite. If `gawk' successfully passes `make test' then you can be confident of a successful port. File: gawk.info, Node: Unix Installation, Next: VMS Installation, Prev: Gawk Distribution, Up: Installation Compiling and Installing `gawk' on Unix ======================================= Often, you can compile and install `gawk' by typing only two commands. However, if you do not use a supported system, you may need to configure `gawk' for your system yourself. * Menu: * Quick Installation:: Compiling `gawk' on a supported Unix version. * Configuration Philosophy:: How it's all supposed to work. * New Configurations:: What to do if there is no supplied configuration for your system. File: gawk.info, Node: Quick Installation, Next: Configuration Philosophy, Prev: Unix Installation, Up: Unix Installation Compiling `gawk' for a Supported Unix Version --------------------------------------------- After you have extracted the `gawk' distribution, `cd' to `gawk-2.15'. Look in the `config' subdirectory for a file that matches your hardware/software combination. In general, only the software is relevant; for example `sunos41' is used for SunOS 4.1, on both Sun 3 and Sun 4 hardware. If you find such a file, run the command: # assume you have SunOS 4.1 ./configure sunos41 This produces a `Makefile' and `config.h' tailored to your system. You may wish to edit the `Makefile' to use a different C compiler, such as `gcc', the GNU C compiler, if you have it. You may also wish to change the `CFLAGS' variable, which controls the command line options that are passed to the C compiler (such as optimization levels, or compiling for debugging). After you have configured `Makefile' and `config.h', type: make and shortly thereafter, you should have an executable version of `gawk'. That's all there is to it! File: gawk.info, Node: Configuration Philosophy, Next: New Configurations, Prev: Quick Installation, Up: Unix Installation The Configuration Process ------------------------- (This section is of interest only if you know something about using the C language and the Unix operating system.) The source code for `gawk' generally attempts to adhere to industry standards wherever possible. This means that `gawk' uses library routines that are specified by the ANSI C standard and by the POSIX operating system interface standard. When using an ANSI C compiler, function prototypes are provided to help improve the compile-time checking. Many older Unix systems do not support all of either the ANSI or the POSIX standards. The `missing' subdirectory in the `gawk' distribution contains replacement versions of those subroutines that are most likely to be missing. The `config.h' file that is created by the `configure' program contains definitions that describe features of the particular operating system where you are attempting to compile `gawk'. For the most part, it lists which standard subroutines are *not* available. For example, if your system lacks the `getopt' routine, then `GETOPT_MISSING' would be defined. `config.h' also defines constants that describe facts about your variant of Unix. For example, there may not be an `st_blksize' element in the `stat' structure. In this case `BLKSIZE_MISSING' would be defined. Based on the list in `config.h' of standard subroutines that are missing, `missing.c' will do a `#include' of the appropriate file(s) from the `missing' subdirectory. Conditionally compiled code in the other source files relies on the other definitions in the `config.h' file. Besides creating `config.h', `configure' produces a `Makefile' from `Makefile.in'. There are a number of lines in `Makefile.in' that are system or feature specific. For example, there is line that begins with `##MAKE_ALLOCA_C##'. This is normally a comment line, since it starts with `#'. If a configuration file has `MAKE_ALLOCA_C' in it, then `configure' will delete the `##MAKE_ALLOCA_C##' from the beginning of the line. This will enable the rules in the `Makefile' that use a C version of `alloca'. There are several similar features that work in this fashion. File: gawk.info, Node: New Configurations, Prev: Configuration Philosophy, Up: Unix Installation Configuring `gawk' for a New System ----------------------------------- (This section is of interest only if you know something about using the C language and the Unix operating system, and if you have to install `gawk' on a system that is not supported by the `gawk' distribution. If you are a C or Unix novice, get help from a local expert.) If you need to configure `gawk' for a Unix system that is not supported in the distribution, first see *Note The Configuration Process: Configuration Philosophy. Then, copy `config.in' to `config.h', and copy `Makefile.in' to `Makefile'. Next, edit both files. Both files are liberally commented, and the necessary changes should be straightforward. While editing `config.h', you need to determine what library routines you do or do not have by consulting your system documentation, or by perusing your actual libraries using the `ar' or `nm' utilities. In the worst case, simply do not define *any* of the macros for missing subroutines. When you compile `gawk', the final link-editing step will fail. The link editor will provide you with a list of unresolved external references--these are the missing subroutines. Edit `config.h' again and recompile, and you should be set. Editing the `Makefile' should also be straightforward. Enable or disable the lines that begin with `##MAKE_WHATEVER##', as appropriate. Select the correct C compiler and `CFLAGS' for it. Then run `make'. Getting a correct configuration is likely to be an iterative process. Do not be discouraged if it takes you several tries. If you have no luck whatsoever, please report your system type, and the steps you took. Once you do have a working configuration, please send it to the maintainers so that support for your system can be added to the official release. *Note Reporting Problems and Bugs: Bugs, for information on how to report problems in configuring `gawk'. You may also use the same mechanisms for sending in new configurations. File: gawk.info, Node: VMS Installation, Next: MS-DOS Installation, Prev: Unix Installation, Up: Installation Compiling, Installing, and Running `gawk' on VMS ================================================ This section describes how to compile and install `gawk' under VMS. * Menu: * VMS Compilation:: How to compile `gawk' under VMS. * VMS Installation Details:: How to install `gawk' under VMS. * VMS Running:: How to run `gawk' under VMS. * VMS POSIX:: Alternate instructions for VMS POSIX. File: gawk.info, Node: VMS Compilation, Next: VMS Installation Details, Prev: VMS Installation, Up: VMS Installation Compiling `gawk' under VMS -------------------------- To compile `gawk' under VMS, there is a `DCL' command procedure that will issue all the necessary `CC' and `LINK' commands, and there is also a `Makefile' for use with the `MMS' utility. From the source directory, use either $ @[.VMS]VMSBUILD.COM $ MMS/DESCRIPTION=[.VMS]DECSRIP.MMS GAWK Depending upon which C compiler you are using, follow one of the sets of instructions in this table: VAX C V3.x Use either `vmsbuild.com' or `descrip.mms' as is. These use `CC/OPTIMIZE=NOLINE', which is essential for Version 3.0. VAX C V2.x You must have Version 2.3 or 2.4; older ones won't work. Edit either `vmsbuild.com' or `descrip.mms' according to the comments in them. For `vmsbuild.com', this just entails removing two `!' delimiters. Also edit `config.h' (which is a copy of file `[.config]vms-conf.h') and comment out or delete the two lines `#define __STDC__ 0' and `#define VAXC_BUILTINS' near the end. GNU C Edit `vmsbuild.com' or `descrip.mms'; the changes are different from those for VAX C V2.x, but equally straightforward. No changes to `config.h' should be needed. DEC C Edit `vmsbuild.com' or `descrip.mms' according to their comments. No changes to `config.h' should be needed. `gawk' 2.15 has been tested under VAX/VMS 5.5-1 using VAX C V3.2, GNU C 1.40 and 2.3. It should work without modifications for VMS V4.6 and up. File: gawk.info, Node: VMS Installation Details, Next: VMS Running, Prev: VMS Compilation, Up: VMS Installation Installing `gawk' on VMS ------------------------ To install `gawk', all you need is a "foreign" command, which is a `DCL' symbol whose value begins with a dollar sign. $ GAWK :== $device:[directory]GAWK (Substitute the actual location of `gawk.exe' for `device:[directory]'.) The symbol should be placed in the `login.com' of any user who wishes to run `gawk', so that it will be defined every time the user logs on. Alternatively, the symbol may be placed in the system-wide `sylogin.com' procedure, which will allow all users to run `gawk'. Optionally, the help entry can be loaded into a VMS help library: $ LIBRARY/HELP SYS$HELP:HELPLIB [.VMS]GAWK.HLP (You may want to substitute a site-specific help library rather than the standard VMS library `HELPLIB'.) After loading the help text, $ HELP GAWK will provide information about both the `gawk' implementation and the `awk' programming language. The logical name `AWK_LIBRARY' can designate a default location for `awk' program files. For the `-f' option, if the specified filename has no device or directory path information in it, `gawk' will look in the current directory first, then in the directory specified by the translation of `AWK_LIBRARY' if the file was not found. If after searching in both directories, the file still is not found, then `gawk' appends the suffix `.awk' to the filename and the file search will be re-tried. If `AWK_LIBRARY' is not defined, that portion of the file search will fail benignly. File: gawk.info, Node: VMS Running, Next: VMS POSIX, Prev: VMS Installation Details, Up: VMS Installation Running `gawk' on VMS --------------------- Command line parsing and quoting conventions are significantly different on VMS, so examples in this manual or from other sources often need minor changes. They *are* minor though, and all `awk' programs should run correctly. Here are a couple of trivial tests: $ gawk -- "BEGIN {print ""Hello, World!""}" $ gawk -"W" version ! could also be -"W version" or "-W version" Note that upper-case and mixed-case text must be quoted. The VMS port of `gawk' includes a `DCL'-style interface in addition to the original shell-style interface (see the help entry for details). One side-effect of dual command line parsing is that if there is only a single parameter (as in the quoted string program above), the command becomes ambiguous. To work around this, the normally optional `--' flag is required to force Unix style rather than `DCL' parsing. If any other dash-type options (or multiple parameters such as data files to be processed) are present, there is no ambiguity and `--' can be omitted. The default search path when looking for `awk' program files specified by the `-f' option is `"SYS$DISK:[],AWK_LIBRARY:"'. The logical name `AWKPATH' can be used to override this default. The format of `AWKPATH' is a comma-separated list of directory specifications. When defining it, the value should be quoted so that it retains a single translation, and not a multi-translation `RMS' searchlist. File: gawk.info, Node: VMS POSIX, Prev: VMS Running, Up: VMS Installation Building and using `gawk' under VMS POSIX ----------------------------------------- Ignore the instructions above, although `vms/gawk.hlp' should still be made available in a help library. Make sure that the two scripts, `configure' and `mungeconf', are executable; use `chmod +x' on them if necessary. Then execute the following commands: $ POSIX psx> configure vms-posix psx> make awktab.c gawk The first command will construct files `config.h' and `Makefile' out of templates. The second command will compile and link `gawk'. Due to a `make' bug in VMS POSIX V1.0 and V1.1, the file `awktab.c' must be given as an explicit target or it will not be built and the final link step will fail. Ignore the warning `"Could not find lib m in lib list"'; it is harmless, caused by the explicit use of `-lm' as a linker option which is not needed under VMS POSIX. Under V1.1 (but not V1.0) a problem with the `yacc' skeleton `/etc/yyparse.c' will cause a compiler warning for `awktab.c', followed by a linker warning about compilation warnings in the resulting object module. These warnings can be ignored. Once built, `gawk' will work like any other shell utility. Unlike the normal VMS port of `gawk', no special command line manipulation is needed in the VMS POSIX environment. File: gawk.info, Node: MS-DOS Installation, Next: Atari Installation, Prev: VMS Installation, Up: Installation Installing `gawk' on MS-DOS =========================== The first step is to get all the files in the `gawk' distribution onto your PC. Move all the files from the `pc' directory into the main directory where the other files are. Edit the file `make.bat' so that it will be an acceptable MS-DOS batch file. This means making sure that all lines are terminated with the ASCII carriage return and line feed characters. restrictions. `gawk' has only been compiled with version 5.1 of the Microsoft C compiler. The file `make.bat' from the `pc' directory assumes that you have this compiler. Copy the file `setargv.obj' from the library directory where it resides to the `gawk' source code directory. Run `make.bat'. This will compile `gawk' for you, and link it. That's all there is to it! File: gawk.info, Node: Atari Installation, Prev: MS-DOS Installation, Up: Installation Installing `gawk' on the Atari ST ================================= This section assumes that you are running TOS. It applies to other Atari models (STe, TT) as well. In order to use `gawk', you need to have a shell, either text or graphics, that does not map all the characters of a command line to upper case. Maintaining case distinction in option flags is very important (*note Invoking `awk': Command Line.). Popular shells like `gulam' or `gemini' will work, as will newer versions of `desktop'. Support for I/O redirection is necessary to make it easy to import `awk' programs from other environments. Pipes are nice to have, but not vital. If you have received an executable version of `gawk', place it, as usual, anywhere in your `PATH' where your shell will find it. While executing, `gawk' creates a number of temporary files. `gawk' looks for either of the environment variables `TEMP' or `TMPDIR', in that order. If either one is found, its value is assumed to be a directory for temporary files. This directory must exist, and if you can spare the memory, it is a good idea to put it on a RAM drive. If neither `TEMP' nor `TMPDIR' are found, then `gawk' uses the current directory for its temporary files. The ST version of `gawk' searches for its program files as described in *Note The `AWKPATH' Environment Variable: AWKPATH Variable. On the ST, the default value for the `AWKPATH' variable is `".,c:\lib\awk,c:\gnu\lib\awk"'. The search path can be modified by explicitly setting `AWKPATH' to whatever you wish. Note that colons cannot be used on the ST to separate elements in the `AWKPATH' variable, since they have another, reserved, meaning. Instead, you must use a comma to separate elements in the path. If you are recompiling `gawk' on the ST, then you can choose a new default search path, by setting the value of `DEFPATH' in the file `...\config\atari'. You may choose a different separator character by setting the value of `ENVSEP' in the same file. The new values will be used when creating the header file `config.h'. Although `awk' allows great flexibility in doing I/O redirections from within a program, this facility should be used with care on the ST. In some circumstances the OS routines for file handle pool processing lose track of certain events, causing the computer to crash, and requiring a reboot. Often a warm reboot is sufficient. Fortunately, this happens infrequently, and in rather esoteric situations. In particular, avoid having one part of an `awk' program using `print' statements explicitly redirected to `"/dev/stdout"', while other `print' statements use the default standard output, and a calling shell has redirected standard output to a file. When `gawk' is compiled with the ST version of `gcc' and its usual libraries, it will accept both `/' and `\' as path separators. While this is convenient, it should be remembered that this removes one, technically legal, character (`/') from your file names, and that it may create problems for external programs, called via the `system()' function, which may not support this convention. Whenever it is possible that a file created by `gawk' will be used by some other program, use only backslashes. Also remember that in `awk', backslashes in strings have to be doubled in order to get literal backslashes. The initial port of `gawk' to the ST was done with `gcc'. If you wish to recompile `gawk' from scratch, you will need to use a compiler that accepts ANSI standard C (such as `gcc', Turbo C, or Prospero C). If `sizeof(int) != sizeof(int *)', the correctness of the generated code depends heavily on the fact that all function calls have function prototypes in the current scope. If your compiler does not accept function prototypes, you will probably have to add a number of casts to the code. If you are using `gcc', make sure that you have up-to-date libraries. Older versions have problems with some library functions (`atan2()', `strftime()', the `%g' conversion in `sprintf()') which may affect the operation of `gawk'. In the `atari' subdirectory of the `gawk' distribution is a version of the `system()' function that has been tested with `gulam' and `msh'; it should work with other shells as well. With `gulam', it passes the string to be executed without spawning an extra copy of a shell. It is possible to replace this version of `system()' with a similar function from a library or from some other source if that version would be a better choice for the shell you prefer. The files needed to recompile `gawk' on the ST can be found in the `atari' directory. The provided files and instructions below assume that you have the GNU C compiler (`gcc'), the `gulam' shell, and an ST version of `sed'. The `Makefile' is set up to use `byacc' as a `yacc' replacement. With a different set of tools some adjustments and/or editing will be needed. `cd' to the `atari' directory. Copy `Makefile.st' to `makefile' in the source (parent) directory. Possibly adjust `../config/atari' to suit your system. Execute the script `mkconf.g' which will create the header file `../config.h'. Go back to the source directory. If you are not using `gcc', check the file `missing.c'. It may be necessary to change forward slashes in the references to files from the `atari' subdirectory into backslashes. Type `make' and enjoy. Compilation with `gcc' of some of the bigger modules, like `awk_tab.c', may require a full four megabytes of memory. On smaller machines you would need to cut down on optimizations, or you would have to switch to another, less memory hungry, compiler. File: gawk.info, Node: Gawk Summary, Next: Sample Program, Prev: Installation, Up: Top `gawk' Summary ************** This appendix provides a brief summary of the `gawk' command line and the `awk' language. It is designed to serve as "quick reference." It is therefore terse, but complete. * Menu: * Command Line Summary:: Recapitulation of the command line. * Language Summary:: A terse review of the language. * Variables/Fields:: Variables, fields, and arrays. * Rules Summary:: Patterns and Actions, and their component parts. * Functions Summary:: Defining and calling functions. * Historical Features:: Some undocumented but supported "features". File: gawk.info, Node: Command Line Summary, Next: Language Summary, Prev: Gawk Summary, Up: Gawk Summary Command Line Options Summary ============================ The command line consists of options to `gawk' itself, the `awk' program text (if not supplied via the `-f' option), and values to be made available in the `ARGC' and `ARGV' predefined `awk' variables: awk [POSIX OR GNU STYLE OPTIONS] -f source-file [`--'] FILE ... awk [POSIX OR GNU STYLE OPTIONS] [`--'] 'PROGRAM' FILE ... The options that `gawk' accepts are: `-F FS' `--field-separator=FS' Use FS for the input field separator (the value of the `FS' predefined variable). `-f PROGRAM-FILE' `--file=PROGRAM-FILE' Read the `awk' program source from the file PROGRAM-FILE, instead of from the first command line argument. `-v VAR=VAL' `--assign=VAR=VAL' Assign the variable VAR the value VAL before program execution begins. `-W compat' `--compat' Specifies compatibility mode, in which `gawk' extensions are turned off. `-W copyleft' `-W copyright' `--copyleft' `--copyright' Print the short version of the General Public License on the error output. This option may disappear in a future version of `gawk'. `-W help' `-W usage' `--help' `--usage' Print a relatively short summary of the available options on the error output. `-W lint' `--lint' Give warnings about dubious or non-portable `awk' constructs. `-W posix' `--posix' Specifies POSIX compatibility mode, in which `gawk' extensions are turned off and additional restrictions apply. `-W source=PROGRAM-TEXT' `--source=PROGRAM-TEXT' Use PROGRAM-TEXT as `awk' program source code. This option allows mixing command line source code with source code from files, and is particularly useful for mixing command line programs with library functions. `-W version' `--version' Print version information for this particular copy of `gawk' on the error output. This option may disappear in a future version of `gawk'. Signal the end of options. This is useful to allow further arguments to the `awk' program itself to start with a `-'. This is mainly for consistency with the argument parsing conventions of POSIX. Any other options are flagged as invalid, but are otherwise ignored. *Note Invoking `awk': Command Line, for more details. File: gawk.info, Node: Language Summary, Next: Variables/Fields, Prev: Command Line Summary, Up: Gawk Summary Language Summary ================ An `awk' program consists of a sequence of pattern-action statements and optional function definitions. PATTERN { ACTION STATEMENTS } function NAME(PARAMETER LIST) { ACTION STATEMENTS } `gawk' first reads the program source from the PROGRAM-FILE(s) if specified, or from the first non-option argument on the command line. The `-f' option may be used multiple times on the command line. `gawk' reads the program text from all the PROGRAM-FILE files, effectively concatenating them in the order they are specified. This is useful for building libraries of `awk' functions, without having to include them in each new `awk' program that uses them. To use a library function in a file from a program typed in on the command line, specify `-f /dev/tty'; then type your program, and end it with a `Control-d'. *Note Invoking `awk': Command Line. The environment variable `AWKPATH' specifies a search path to use when finding source files named with the `-f' option. The default path, which is `.:/local/lib/awk:/gnu/lib/awk' is used if `AWKPATH' is not set. If a file name given to the `-f' option contains a `/' character, no path search is performed. *Note The `AWKPATH' Environment Variable: AWKPATH Variable, for a full description of the `AWKPATH' environment variable. `gawk' compiles the program into an internal form, and then proceeds to read each file named in the `ARGV' array. If there are no files named on the command line, `gawk' reads the standard input. If a "file" named on the command line has the form `VAR=VAL', it is treated as a variable assignment: the variable VAR is assigned the value VAL. If any of the files have a value that is the null string, that element in the list is skipped. For each line in the input, `gawk' tests to see if it matches any PATTERN in the `awk' program. For each pattern that the line matches, the associated ACTION is executed. File: gawk.info, Node: Variables/Fields, Next: Rules Summary, Prev: Language Summary, Up: Gawk Summary Variables and Fields ==================== `awk' variables are dynamic; they come into existence when they are first used. Their values are either floating-point numbers or strings. `awk' also has one-dimension arrays; multiple-dimensional arrays may be simulated. There are several predefined variables that `awk' sets as a program runs; these are summarized below. * Menu: * Fields Summary:: Input field splitting. * Built-in Summary:: `awk''s built-in variables. * Arrays Summary:: Using arrays. * Data Type Summary:: Values in `awk' are numbers or strings. File: gawk.info, Node: Fields Summary, Next: Built-in Summary, Prev: Variables/Fields, Up: Variables/Fields Fields ------ As each input line is read, `gawk' splits the line into FIELDS, using the value of the `FS' variable as the field separator. If `FS' is a single character, fields are separated by that character. Otherwise, `FS' is expected to be a full regular expression. In the special case that `FS' is a single blank, fields are separated by runs of blanks and/or tabs. Note that the value of `IGNORECASE' (*note Case-sensitivity in Matching: Case-sensitivity.) also affects how fields are split when `FS' is a regular expression. Each field in the input line may be referenced by its position, `$1', `$2', and so on. `$0' is the whole line. The value of a field may be assigned to as well. Field numbers need not be constants: n = 5 print $n prints the fifth field in the input line. The variable `NF' is set to the total number of fields in the input line. References to nonexistent fields (i.e., fields after `$NF') return the null-string. However, assigning to a nonexistent field (e.g., `$(NF+2) = 5') increases the value of `NF', creates any intervening fields with the null string as their value, and causes the value of `$0' to be recomputed, with the fields being separated by the value of `OFS'. *Note Reading Input Files: Reading Files, for a full description of the way `awk' defines and uses fields. File: gawk.info, Node: Built-in Summary, Next: Arrays Summary, Prev: Fields Summary, Up: Variables/Fields Built-in Variables ------------------ `awk''s built-in variables are: `ARGC' The number of command line arguments (not including options or the `awk' program itself). `ARGIND' The index in `ARGV' of the current file being processed. It is always true that `FILENAME == ARGV[ARGIND]'. `ARGV' The array of command line arguments. The array is indexed from 0 to `ARGC' - 1. Dynamically changing the contents of `ARGV' can control the files used for data. `CONVFMT' The conversion format to use when converting numbers to strings. `FIELDWIDTHS' A space separated list of numbers describing the fixed-width input data. `ENVIRON' An array containing the values of the environment variables. The array is indexed by variable name, each element being the value of that variable. Thus, the environment variable `HOME' would be in `ENVIRON["HOME"]'. Its value might be `/u/close'. Changing this array does not affect the environment seen by programs which `gawk' spawns via redirection or the `system' function. (This may change in a future version of `gawk'.) Some operating systems do not have environment variables. The array `ENVIRON' is empty when running on these systems. `ERRNO' The system error message when an error occurs using `getline' or `close'. `FILENAME' The name of the current input file. If no files are specified on the command line, the value of `FILENAME' is `-'. `FNR' The input record number in the current input file. The input field separator, a blank by default. `IGNORECASE' The case-sensitivity flag for regular expression operations. If `IGNORECASE' has a nonzero value, then pattern matching in rules, field splitting with `FS', regular expression matching with `~' and `!~', and the `gsub', `index', `match', `split' and `sub' predefined functions all ignore case when doing regular expression operations. The number of fields in the current input record. The total number of input records seen so far. `OFMT' The output format for numbers for the `print' statement, `"%.6g"' by default. `OFS' The output field separator, a blank by default. `ORS' The output record separator, by default a newline. The input record separator, by default a newline. `RS' is exceptional in that only the first character of its string value is used for separating records. If `RS' is set to the null string, then records are separated by blank lines. When `RS' is set to the null string, then the newline character always acts as a field separator, in addition to whatever value `FS' may have. `RSTART' The index of the first character matched by `match'; 0 if no match. `RLENGTH' The length of the string matched by `match'; -1 if no match. `SUBSEP' The string used to separate multiple subscripts in array elements, by default `"\034"'. *Note Built-in Variables::, for more information. File: gawk.info, Node: Arrays Summary, Next: Data Type Summary, Prev: Built-in Summary, Up: Variables/Fields Arrays ------ Arrays are subscripted with an expression between square brackets (`[' and `]'). Array subscripts are *always* strings; numbers are converted to strings as necessary, following the standard conversion rules (*note Conversion of Strings and Numbers: Conversion.). If you use multiple expressions separated by commas inside the square brackets, then the array subscript is a string consisting of the concatenation of the individual subscript values, converted to strings, separated by the subscript separator (the value of `SUBSEP'). The special operator `in' may be used in an `if' or `while' statement to see if an array has an index consisting of a particular value. if (val in array) print array[val] If the array has multiple subscripts, use `(i, j, ...) in array' to test for existence of an element. The `in' construct may also be used in a `for' loop to iterate over all the elements of an array. *Note Scanning all Elements of an Array: Scanning an Array. An element may be deleted from an array using the `delete' statement. *Note Arrays in `awk': Arrays, for more detailed information. File: gawk.info, Node: Data Type Summary, Prev: Arrays Summary, Up: Variables/Fields Data Types ---------- The value of an `awk' expression is always either a number or a string. Certain contexts (such as arithmetic operators) require numeric values. They convert strings to numbers by interpreting the text of the string as a numeral. If the string does not look like a numeral, it converts to 0. Certain contexts (such as concatenation) require string values. They convert numbers to strings by effectively printing them with `sprintf'. *Note Conversion of Strings and Numbers: Conversion, for the details. To force conversion of a string value to a number, simply add 0 to it. If the value you start with is already a number, this does not change it. To force conversion of a numeric value to a string, concatenate it with the null string. The `awk' language defines comparisons as being done numerically if both operands are numeric, or if one is numeric and the other is a numeric string. Otherwise one or both operands are converted to strings and a string comparison is performed. Uninitialized variables have the string value `""' (the null, or empty, string). In contexts where a number is required, this is equivalent to 0. *Note Variables::, for more information on variable naming and initialization; *note Conversion of Strings and Numbers: Conversion., for more information on how variable values are interpreted. File: gawk.info, Node: Rules Summary, Next: Functions Summary, Prev: Variables/Fields, Up: Gawk Summary Patterns and Actions ==================== * Menu: * Pattern Summary:: Quick overview of patterns. * Regexp Summary:: Quick overview of regular expressions. * Actions Summary:: Quick overview of actions. An `awk' program is mostly composed of rules, each consisting of a pattern followed by an action. The action is enclosed in `{' and `}'. Either the pattern may be missing, or the action may be missing, but, of course, not both. If the pattern is missing, the action is executed for every single line of input. A missing action is equivalent to this action, { print } which prints the entire line. Comments begin with the `#' character, and continue until the end of the line. Blank lines may be used to separate statements. Normally, a statement ends with a newline, however, this is not the case for lines ending in a `,', `{', `?', `:', `&&', or `||'. Lines ending in `do' or `else' also have their statements automatically continued on the following line. In other cases, a line can be continued by ending it with a `\', in which case the newline is ignored. Multiple statements may be put on one line by separating them with a `;'. This applies to both the statements within the action part of a rule (the usual case), and to the rule statements. *Note Comments in `awk' Programs: Comments, for information on `awk''s commenting convention; *note `awk' Statements versus Lines: Statements/Lines., for a description of the line continuation mechanism in `awk'. File: gawk.info, Node: Pattern Summary, Next: Regexp Summary, Prev: Rules Summary, Up: Rules Summary Patterns -------- `awk' patterns may be one of the following: /REGULAR EXPRESSION/ RELATIONAL EXPRESSION PATTERN && PATTERN PATTERN || PATTERN PATTERN ? PATTERN : PATTERN (PATTERN) ! PATTERN PATTERN1, PATTERN2 BEGIN END `BEGIN' and `END' are two special kinds of patterns that are not tested against the input. The action parts of all `BEGIN' rules are merged as if all the statements had been written in a single `BEGIN' rule. They are executed before any of the input is read. Similarly, all the `END' rules are merged, and executed when all the input is exhausted (or when an `exit' statement is executed). `BEGIN' and `END' patterns cannot be combined with other patterns in pattern expressions. `BEGIN' and `END' rules cannot have missing action parts. For `/REGULAR-EXPRESSION/' patterns, the associated statement is executed for each input line that matches the regular expression. Regular expressions are extensions of those in `egrep', and are summarized below. A RELATIONAL EXPRESSION may use any of the operators defined below in the section on actions. These generally test whether certain fields match certain regular expressions. The `&&', `||', and `!' operators are logical "and," logical "or," and logical "not," respectively, as in C. They do short-circuit evaluation, also as in C, and are used for combining more primitive pattern expressions. As in most languages, parentheses may be used to change the order of evaluation. The `?:' operator is like the same operator in C. If the first pattern matches, then the second pattern is matched against the input record; otherwise, the third is matched. Only one of the second and third patterns is matched. The `PATTERN1, PATTERN2' form of a pattern is called a range pattern. It matches all input lines starting with a line that matches PATTERN1, and continuing until a line that matches PATTERN2, inclusive. A range pattern cannot be used as an operand to any of the pattern operators. *Note Patterns::, for a full description of the pattern part of `awk' rules.