home *** CD-ROM | disk | FTP | other *** search
- Submitted-by: rv@erix.ericsson.se (Robert Virding)
-
- I have been implementing a set of string functions for a language
- which we have developed at our lab. (Erlang, it's a real-time
- concurrent high-level language for building large robust systems) As I
- had need of some regular expression matching I decided to take the REs
- and basic functions (match, sub, gsub and split) as defined in "The
- AWK Programming Language" by Aho, Kernighan and Weinberger.
-
- All went well until I started looking at how "awk" handles newlines
- embedded in strings wrt beginning/end of string expressions. I
- discovered that all the "awk"s we had acted differently. I have tried:
-
- awk - the old "standard" one
- nawk - a "new" one available on SUN (?), compatible with book
- gawk - GNU awk
- mawk - recently available from comp.sources.reviewed
-
- Both gawk and mawk follow the book and conform to POSIX 1003.2 (draft
- 11) standard.
-
- I enclose some small test files which show how inconsistent the
- systems really are (or seem to be). Even when there were no newlines
- in the string results were inconsistent. What does the standard say?
- Any help would truly be appreciated.
-
-
- Robert Virding @ Ellemtel Utvecklings AB, Stockholm
- EMAIL: rv@erix.ericsson.se
-
- P.S. If there is any interest I will summarise the results.
-
- TESTS AND RESULTS
- =================
-
- Test 1
- ------
-
- A simple test which all "awk"s could do. I must admit I can't really
- understand what gawk and mawk are doing. Nawk is closer to what I
- think would happen. But why isn't the 'f' removed in the 2nd case?
-
- BEGIN {
- s1 = "foobarbaz"
- n = split(s1, a1, "^.")
- print "n = " n
- for (i = 1; i <= n; i++)
- print ":" a1[i] ":"
-
- s2 = "foo\nbar\nbaz"
- n = split(s2, a2, "^.")
- print "n = " n
- for (i = 1; i <= n; i++)
- print ":" a2[i] ":"
-
- exit
- }
-
- awk nawk gawk mawk
-
- n = 1 n = 2 n = 9 n = 10
- :foobarbaz: :: :: ::
- n = 3 :oobarbaz: :: ::
- :foo: n = 1 :: ::
- :bar: :foo :: ::
- :baz: bar :: ::
- baz: :: ::
- :: ::
- :: ::
- :: ::
- n = 9 ::
- :: n = 12
- :: ::
- :: ::
- : ::
- : ::
- :: ::
- :: ::
- : ::
- : ::
- :: ::
- :: ::
- ::
- ::
-
- Test 2
- ------
-
- This test was impossible for awk (no gsub function). I will also admit
- that nawk here seemd to do something which I feel is more reasonable.
-
- BEGIN {
- s1 = "foobarbaz"
- gsub("^.", "X&Y", s1)
- print ":" s1 ":"
-
- s2 = "foo\nbar\nbaz"
- gsub("^.", "X&Y", s2)
- print ":" s2 ":"
-
- exit
- }
-
- nawk gawk mawk
-
- :XfYoobarbaz: :XfYXoYXoYXbYXaYXrYXbYXaYXzY: :XfYXoYXoYXbYXaYXrYXbYXaYXzY:
- :XfYoo :XfYXoYXoY :XfYXoYXoYX
- bar XbYXaYXrY YXbYXaYXrYX
- baz: XbYXaYXzY: YXbYXaYXzY:
-
-
- Test 3
- ------
-
- This test was impossible for awk (no match function). Mawk here seemed the
- best to me. Where nawk got its RLENGTH values from I can't understand.
-
- BEGIN {
- s1 = "foo"
- match(s1, /foo$/)
- print RSTART, RLENGTH
- print ":" substr(s1, RSTART, RLENGTH) ":"
-
- s2 = "foo\n\n"
- match(s2, /foo..$/)
- print RSTART, RLENGTH
- print ":" substr(s2, RSTART, RLENGTH) ":"
-
- exit
- }
-
- nawk gawk mawk
-
- 1 4 1 3 1 3
- :foo: :foo: :foo:
- 1 6 0 -1 1 5
- :foo :: :foo
-
- : :
-
- The very end
-
-
- Volume-Number: Volume 27, Number 26
-
-