home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
OS/2 Shareware BBS: 11 Util
/
11-Util.zip
/
qtawkos2.zip
/
DIFFDOC.FMT
next >
Wrap
Text File
|
1994-11-21
|
35KB
|
755 lines
QTAwk - 1 - QTAwk
See the following for a complete definition of Awk
A. V. Aho, B. W. Kernighan, P. J. Weinberger,
"The AWK Programming Language"
Addison-Wesley 1988, ISBN 0-201-07981-X
Major differences between QTAwk and Awk.
1. Expanded Regular Expressions
All of the Awk regular expression operators are allowed plus
the following:
a) complemented character class using the Awk notation,
'[^...]', as well as the Awk/QTAwk and C logical negation
operator, '[!...]'.
b) Matched character classes, '[#...]'. These classes are
used in pairs. The position of the character matched in
the first class of the pair, determines the character
which must match in the position occupied by the second
class of the pair.
c) Look-ahead Operator. r@t regular expression r is
matched only when followed by regular expression t.
d) Repetition Operator. r{n1,n2} at least n1 and up to n2
repetitions of regular expression r.
e) Named Expressions. {named_expr} is replaced by the
string value of the corresponding variable.
f) Tagged Expressions. Enclosing an expression, or portion
of an expression, in parenthesis, "()" makes that
expression available for use with the Tag Operator,
'$$'.
2. Consistent statement termination syntax. The QTAwk Utility
Creation Tool utilizes the semi-colon, ';', to terminate all
statements. The practice in Awk of using newlines to
"sometimes" terminate statements is no longer allowed.
3. Expanded Operator Set
QTAwk - 1 - QTAwk
QTAwk - 2 - QTAwk
The Awk set of operators has been changed to more closely
match those of C. The Awk match operator, '~', has been
changed to '~~' so that the similarity between the match
operators, '~~' and '!~', to the equality operators, '==' and
'!=", is complete. The single tilde symbol, '~', reverts to
the C one's complement operator, an addition to the operator
set over Awk. The introduction of the explicit string
concatenation operator. The remaining "new" operators to
QTAwk are:
Operation Operator
tag $$
one's complement ~
concatenation ∩
shift left/right << >>
matching ~~ !~
bit-wise AND &
bit-wise XOR @
bit-wise OR |
sequence ,
The carot, '^', remains as the exponentiation operator. The
symbol '@' is used for the exclusive OR operator. For string
operands, the shift operators, '<<' and '>>', shift the
strings with wrap-around instead of a bit shift as for
numeric operands.
4. Expanded set of recognized constants in QTAwk utilities and
input:
a) decimal integers,
b) octal integers,
c) hexadecimal integers,
d) character constants, and
e) floating point constants.
5. Expanded Predefined patterns giving more control:
a) INIITAL - similar to BEGIN. Actions executed after
opening each input file and before reading first record.
b) FINAL - similar to END. Actions executed after reading
last record of each input file and before closing file.
c) NOMATCH - actions executed for each input record for
QTAwk - 2 - QTAwk
QTAwk - 3 - QTAwk
which no pattern was matched.
d) GROUP - used to group multiple regular expressions for
search optimization. Can speed search by a factor of
six.
6. True Multidimensional Arrays
The use of the comma in index expressions to simulate
multiple array indices is no longer supported. True multiple
indices are supported. Indexing is in the C manner,
'a[i1][i2]'. The SUBSEP built-in variable of Awk has been
dropped since it is no longer necessary. The Awk practice of
only creating array elements which have been referenced is
continued.
7. Integer array indices as well as string indices
Array indices have been expanded to include integers as well
as the string indices of Awk. Indices are not automatically
converted to strings as in Awk. Thus, for true integer
indices, the index ordering follows the numeric sequence with
an integer index value of '10' following a value of '2'
instead of preceding it. Both integer and string indices may
be mixed in referencing the elements of an array. Thus, the
following is supported:
a) A[1]
b) A["WA"]
c) A[1]["MD"]
d) A["WA"][10]
8. Arrays integrated into QTAwk
QTAwk integrates arrays with arithmetic operators so that the
operations are carried out on the entire array. QTAwk also
integrates arrays into user-defined functions so that they
can be passed to and returned from such functions in a
natural and intuitive manner. Awk does not allow returning
arrays from user-defined functions or allow arithmetic
operators to operate on whole arrays.
9. Use of arrays for true dynamic Regular Expressions. The use
of an arrays in a match operation will match against every
element as a separate regular expression. The internal form
of the combined regular expressions will be derived when
first used. The internal form can be assigned with the array
and is not discarded until the array is changed. The user
thus controls explicity through the assignment statement, the
retention of the internal regular expression form.
QTAwk - 3 - QTAwk
QTAwk - 4 - QTAwk
10. NEW keywords:
a) cycle
similar to 'next' except that may use current record in
restarting outer pattern matching loop. Built-in
variable 'MAX_CYCLE' limits number of possible loops to
prevent infinite loops. Also built-in variable
'CYCLE_COUNT' contains current count of cycles.
b) deletea
similar to 'delete' except that ALL array values
deleted.
c) switch, case, default
similar to C syntax with the allowed 'switch' and 'case'
values expanded to include any legal QTAwk expression,
evaluated at run-time. The expressions may evaluate to
any value including any numeric value, string or regular
expression. Numeric and string values are compared as
numerics or strings respectively. Regular expression
values are compared as in the match operator, '~~'.
d) local
new keyword to allow the declaration and use of local
variables within compound statements, including
user-defined functions. Its use in user defined
functions instead of the Awk practice of defining excess
formal parameters, leads to easier to read and maintain
functions. The C 'practice' of allowing initialization
in the 'local' statement is followed.
e) endfile
similar to 'exit'. Simulates end of current input file
only, any remaining input files are still processed.
f) include
A new directive "#include" has been added. This
directive allows the user to include utility files from
within another utility file. This is handy for including
files of often used user-defined functions or constants
without having to name the file on the command line.
11. Expanded Arithmetic Functions
QTAwk includes built-in arithmetic functions. All of the
functions supported by Awk plus the following:
a) acos(x)
QTAwk - 4 - QTAwk
QTAwk - 5 - QTAwk
b) asin(x)
c) cosh(x)
d) jdn() or jdn(y,m,d) or jdn(fdate)
e) fract(x)
f) log10(x)
g) pi() or pi
h) sinh(x)
12. Expanded string functions
QTAwk includes built-in string functions. All of the
functions supported by Awk plus the following:
a) cal(fmt_str,jdn)
b) center(s,w) or center(s,w,c)
c) copies(s,n)
d) deletec(s,p,n)
e) insert(s1,s2,p)
f) justify(a,n,w) or justify(a,n,w,c)
g) overlay(s1,s2,p)
h) remove(s,c)
i) replace(s)
j) sdate(fmt_str) or sdate(fmt_str,fdate) or
sdate(fmt_str,y,m,d)
k) srange(c1,c2)
l) srev(s)
m) stime(fmt_str) or stime(fmt_str,ftime) or
stime(fmt_str,h,m,s)
n) stran(s) or stran(s,st) or stran(s,st,sf)
o) strim(s) or strim(s,c) or strim(s,c,d)
p) strlwr(s)
q) strupr(s)
13. New Miscellaneous functions
a) The function 'rotate(a)' is provided to rotate the
elements of the array a.
b) execute(s) or execute(s,se) or execute(s,se,rf) - execute
string s
c) execute(a) or execute(a,se) or execute(a,se,rf) - execute
array a
d) e_type(e) - return type of expression 'e'
e) findfile(var,pattern,attributes)
f) pd_sym - access pre-defined symbol table
g) ud_sym - access user defined symbol table
h) resetre - reset all regular expressions to utility
start-up condition
14. New I/O functions
QTAwk - 5 - QTAwk
QTAwk - 6 - QTAwk
I/O function syntax has been made consistent with syntax of
other functions. The redirection operators, '<', '>' and
'>>', and pipeline operator, '|', have been deleted as
excessively error prone in expressions. The functional
syntax of the 'getline' function has been made identical to
that of the other built-in functions. The new functions
'fgetline', 'fprint' and 'fprintf' have been introduced for
reading and writing to files other than the current input
file.
Single character input/output functions have been added:
a) getc() - return next character from current input file,
b) fgetc() - return next character from named file
c) putc(c) - output character c to standard output file
d) fputc(c,F) - output character c to file F
e) srchrecord(sp[,rs[,var]]) - search current input file for
pattern
f) fsrchrecord(fn,sp[,rs[,var]]) - search file fn for
pattern
The dropped file re-direction operator, '>>', has been
replaced by the 'append' function:
append(F) -- Opens the file F for output to the end of the
file. All subsequent output to the file is appended to the
end of the file. This function must be called before the
first output to the file to append. Any output to the file
prior to calling this function will open the file and discard
any existing contents, i.e., truncate to zero length.
15. New function to obtain record number of specific file:
get_FNR(file)
16. Expanded capability of formatted Output.
The limited output formatting available with the Awk 'printf'
function has been expanded by adopting the complete output
format specification of the ANSI C standard.
17. Use of 'local' keyword
The 'local' keyword has been introduced to allow for
variables local to user-defined functions (and any compound
statement). This expansion makes the Awk practice of
defining 'extra' formal parameters no longer necessary.
18. Expanded user-defined functions
With the 'local' keyword, QTAwk allows the user to define
QTAwk - 6 - QTAwk
QTAwk - 7 - QTAwk
functions that may accept a variable number of arguments.
Functions, such as finding the minimum/maximum of a variable
number of variables, are possible with one function rather
than defining separate functions for each possible
combination of arguments.
19. User controlled trace capability
A user controlled statement trace capability has been added.
This gives the user a simple to use mechanism to trace
utility execution. Rather than adding 'print' statements,
merely re-defining the value of a built-in variable will give
utility execution trace information, including utility line
number.
20. Expanded built-in variable list
With built-in variables, QTAwk includes all of the built-in
variables of Awk plus the following: (Note: the definition
and use of SUBSEP has been changed from that in Awk. Others
which have been cjanged are also listed below.)
a) _arg_chk - used to determine whether to check number of
arguments passed to user-defined functions to insure that
number passed agree with the number defined.
b) ARGI - index value in ARGV of next command line
argument. Gives more control of command line argument
processing.
c) CONVFMT ==> STRING conversion format for floating point
numbers. Default value of "%.6g". Used only for
converting floating point numbers to strings. OFMT used
for output.
d) CLENGTH - similar to 'RLENGTH' of Awk. Set whenever a
'case' value evaluates to a regular expression.
e) CSTART - similar to 'RSTART' of Awk. Set whenever a
'case' value evaluates to a regular expression.
f) CYCLE_COUNT - count number of outer loop cycles with
current input record.
g) DEGREES - if TRUE, trigonometric functions assume degree
values, radians if FALSE.
h) DELAY_INPUT_PARSE ==> TRUE/FALSE. Default value = 0.
QTAwk - 7 - QTAwk
QTAwk - 8 - QTAwk
Used to delay parsing of the current input record until
the value of NF or one of the field variables, $i, 1 ≤ i
≤ NF, is needed. The default value is false. If the
value is true, then the input record is not parsed until
necessary. For utilities which do not reference NF or
the field variables, $i, in any patterns expressions (or
seldom executed pattern expressions), delaying the
parsing of the input record can speed the execution of
the utility significantly. The normal sequence of
execution is:
1: determine next record according to RS and read
record,
2: set FNR and NR,
3: parse record according to FS and set NF and $i, 1 ≤ i
≤ NF,
4: start executing pattern expressions
The third step in the above sequence can be delayed
until a field variable value or NF is needed by
DELAY_INPUT_PARSE to a true value.
i) ENVIRON - one dimensional array with elements equal to
the environment strings passed to QTAwk
j) ECHO_INPUT - controls echo of standard input file to
standard output file.
k) FALSE - predefined with constant value, 0.
l) FIELDFILL ==> used to fill a field when the replacement
value is less than the field width or if the field
changed is greater than the number of fields in the
original record. If the field changed is greater than
the number of fields defined in FIELDWIDTHS, the extra
fields created are initialized as null strings separated
by the strng value of OFS. This variable value is used
only when field splitting is based on FIELDWIDTHS rather
than FS. The default value is a single blank character.
m) FIELDWIDTHS ==> when assigned a string value containing
space separated integral numbers of the form:
n1 n2 n3 ... nn
the splitting of input records into fields is governed
by the numbers in FIELDWIDTHS rather than FS. Each
QTAwk - 8 - QTAwk
QTAwk - 9 - QTAwk
number in FIELDWIDTHS specifies the width of a field
including columns between fields. If you want to ignore
the columns between fields, you can specify the width as
a separate field that is subsequently ignored. When the
value FIELDWIDTHS does not match this form, field
splitting is done using FS in the usual manner. If the
length of the input record is greater than the sum of the
field widths specified in FIELDWIDTHS, QTAwk creates an
additional field and assigns the remainder of the input
record to the field.
n) FILE_SORT - controls sort order of file list returned by
findfile. files may be sorted by name, extension, date,
time and/or size.
o) FILEDATE -- date in DOS format of current input file.
p) FILEDATE_CREATE ==> contains the creation date of the
current input file in the operating system format. The
'sdate' functions may be used to format the date.
Changing the value of this variable has no effect on the
creation date of the current input file. On PC/MS-DOS
systems, this value of this variable equals the value of
FILEDATE.
q) FILEDATE_LACCESS ==> contains the last file access date
of the current input file in the operating system
format. The 'sdate' functions may be used to format the
date. Changing the value of this variable has no effect
on the last access date of the current input file. On
PC/MS-DOS systems, this value of this variable equals the
value of FILEDATE.
r) FILETIME -- time in DOS format of current input file.
s) FILETIME_CREATE ==> contains the creation time of the
current input file in the DOS format. The 'stime'
functions may be used to format the time. Changing the
value of this variable has no effect on the creation time
of the current input file. On PC/MS-DOS systems the
value of this variable equals the value of FILETIME.
t) FILETIME_LACCESS ==> contains the last file access time
of the current input file in the DOS format. The 'stime'
functions may be used to format the time. Changing the
value of this variable has no effect on the lastr access
time of the current input file. On PC/MS-DOS systems the
QTAwk - 9 - QTAwk
QTAwk - 10 - QTAwk
value of this variable equals the value of FILETIME.
u) FILESIZE -- size in bytes of current input file.
v) FILEATTR -- file attributes of current input file.
w) FILEPATH ==> contains the drive and path of the current
input file. The path string ends with the subdirectory
separator character. Changing the value of this variable
has no effect on the path of the current input file.
x) FILE_SEARCH -- TRUE/FALSE value to search current input
file for record(s) containing match to regular
expression(s) in FILE_SEARCH_PAT. Default value FALSE.
y) FILE_SEARCH_PAT -- contains one or more regular
expressions for searching current input file.
z) Gregorian - controls use of Gregorian or Julian calendar
in cal and jdn.
aa) IGNORECASE ==> TRUE/FALSE. Default == 0. If assigned
a true value, QTAwk ignores case in any operation
comparing two strings and for single character
comparisons. The affected comparisons include:
1: any match against regular expressions using the
operators, '~~' and '!~', either explicit or implied,
in pattern expressions,
2: any match against regular expressions using the
operators, '~~' and '!~', either explicit or implied,
in action expressions,
3: any match against regular expressions using the
functions: gsub(), index(), match(), split(), sub(),
strim, srch_record and fsrch_record.
4: any search using the FILE_SEARCH_PAT variable,
5: any matches of one string against another using the
'==' and '!=' operators,
6: any match of one character against another using the
'==' and '!=' operators,
7: any match of one string against another string in
'switch'/'case' statements,
8: any match of one character against another character
in 'switch'/'case' statements,
9: any match against a regular expression in
'switch'/'case' statements,
QTAwk - 10 - QTAwk
QTAwk - 11 - QTAwk
10: all searches for record terminator character/strings
using RS,
11: all searches for field terminator character/strings
using FS.
ab) LONGEST_EXP - used to control whether the longest or
the first string matching a regular expression is found
in regular expression matching done in patterns, the
match operators, '~~' and '!~', the 'match' function, the
'sub' and 'gsub' functions and in 'case' regular
expressions.
ac) MAX_CYCLE - maximum number of outer loop cycles
permitted with current input record.
ad) MLENGTH - similar to 'RLENGTH' of Awk. Set whenever a
stand-alone regular expression is encountered in
evaluating a pattern.
ae) MSTART - similar to 'RSTART' of Awk. Set whenever a
stand-alone regular expression is encountered in
evaluating a pattern.
af) NG - equal to the number of the regular expression in a
group matching a string in the current input record.
ag) OFMT ==> output conversion format for floating point
numbers. Default value of "%.6g". Used only for output,
see CONVFMT for internal conversion of floating point
numbers.
ah) QTAwk_Path - initialized from 'QTAWK' environment
variable. Sets paths searched for input files.
ai) RECLEN ==> If assigned a non-zero numeric value, then
the current input file is assumed to consist of fixed
length records with the record length equal to the
integral value of RECLEN. When RECLEN determines input
records, RT is assigned the null string whenever a record
is read from the current input file.
aj) RETAIN_FS - if TRUE the original characters separating
the fields of the current input record are retained
whenever a field is changed, causing the input record to
be re-constructed. If FALSE the output field separator,
OFS, is used to separate fields in the current input
QTAwk - 11 - QTAwk
QTAwk - 12 - QTAwk
record during reconstruction. The latter practice is the
only method available in Awk.
ak) RT ==> set equal to the record terminator string
whenever a new record is read from the current input
file. If fixed langth records are used by setting
RECLEN, then RT is assigned the null string.
al) SPAN_RECORDS -- TRUE/FALSE, default value FALSE. if
TRUE allows matches to FILE_SEARCH_PAT to span multiple
input records and return multiple records in $0. If
FALSE, matches confined to a single record. Also
controls matches spanning records in 'srchrecord' and
'fsrchrecord' functions.
am) SUBSEP -- string value used as the array element index
separator in MATCH_INDEX.
an) TRACE - value used to determine utility tracing.
ao) TRANS_FROM/TRANS_TO - strings used by 'stran' function
if second and/or third arguments not specified.
ap) TRUE - predefined with constant value, 1
aq) vargc - used only in used-defined functions defined
with a variable number of arguments. At runtime, set
equal to the actual number of variable arguments passed.
ar) vargv - used only in used-defined functions defined
with a variable number of arguments. At runtime, an
single dimensioned array with each element set to the
argument actually passed.
21. Definition of built-in variable, RS, expanded. When value
assigned to RS, it is converted to string form. If longer
than a single character, string treated as a regular
expression. Strings matching regular expression act as
record separator. If single character, that character
becomes record separator. Similar in behavior to field
separator, FS.
22. In QTAwk, setting built-in variable, "FILENAME", to another
value will change the current input file. Setting the
variable in Awk, has no effect on current input file.
QTAwk - 12 - QTAwk
QTAwk - 13 - QTAwk
23. In QTAwk, setting built-in variable, NF to another value
will change the current contents of $0. If the new value is
greater than the current value, the current input line is
lengthened with new empty fields separated by the output
field separator strings, OFS. If the new value is less than
the current value, the $0 is shortened by truncating at the
end of the field corresponding to the new NF value.
24. Tag Operator, $$, may be used in manner similar to field
operator, $. The tag operator may be used to obtain or to
set a particular part of the string matching the regular
expression pattern.
25. The return value of the 'getline' function has been changed
when a valid record has been read. The return value is the
length of the record plus the length of the End-Of-Record
plus 1.
26. Fixed length records can be read by setting the value of
RECLEN appropriately.
27. Fixed width fields are supported by setting the value of
FIELD_WIDTHS appropriately.
28. Corrected admitted problems with Awk. The problems
mentioned on page 182 of "The Awk Programming Language" have
been corrected. Specifically:
a) true multidimensional arrays have been implemented,
b) the 'getline' syntax has been made to match that of
other functions,
c) declaring local variables in user-defined functions has
been corrected,
d) intervening blanks are allowed between the function call
name and the opening parenthesis (in fact, under QTAwk it
is permissible to have no opening parenthesis or argument
list for user-defined functions that have been defined
with no formal arguments).
QTAwk - 13 - QTAwk