Usenet 1994 January

home *** CD-ROM | disk | FTP | other *** search

/ Usenet 1994 January / usenetsourcesnewsgroupsinfomagicjanuary1994.iso / sources / std_unix / volume.27 / text0025.txt < prev next >

Wrap

Text File | 1992-05-20 | 3.1 KB | 146 lines

Submitted-by: rv@erix.ericsson.se (Robert Virding) I have been implementing a set of string functions for a language which we have developed at our lab. (Erlang, it's a real-time concurrent high-level language for building large robust systems) As I had need of some regular expression matching I decided to take the REs and basic functions (match, sub, gsub and split) as defined in "The AWK Programming Language" by Aho, Kernighan and Weinberger. All went well until I started looking at how "awk" handles newlines embedded in strings wrt beginning/end of string expressions. I discovered that all the "awk"s we had acted differently. I have tried: awk - the old "standard" one nawk - a "new" one available on SUN (?), compatible with book gawk - GNU awk mawk - recently available from comp.sources.reviewed Both gawk and mawk follow the book and conform to POSIX 1003.2 (draft 11) standard. I enclose some small test files which show how inconsistent the systems really are (or seem to be). Even when there were no newlines in the string results were inconsistent. What does the standard say? Any help would truly be appreciated. Robert Virding @ Ellemtel Utvecklings AB, Stockholm EMAIL: rv@erix.ericsson.se P.S. If there is any interest I will summarise the results. TESTS AND RESULTS ================= Test 1 ------ A simple test which all "awk"s could do. I must admit I can't really understand what gawk and mawk are doing. Nawk is closer to what I think would happen. But why isn't the 'f' removed in the 2nd case? BEGIN { s1 = "foobarbaz" n = split(s1, a1, "^.") print "n = " n for (i = 1; i <= n; i++) print ":" a1[i] ":" s2 = "foo\nbar\nbaz" n = split(s2, a2, "^.") print "n = " n for (i = 1; i <= n; i++) print ":" a2[i] ":" exit } awk nawk gawk mawk n = 1 n = 2 n = 9 n = 10 :foobarbaz: :: :: :: n = 3 :oobarbaz: :: :: :foo: n = 1 :: :: :bar: :foo :: :: :baz: bar :: :: baz: :: :: :: :: :: :: :: :: n = 9 :: :: n = 12 :: :: :: :: : :: : :: :: :: :: :: : :: : :: :: :: :: :: :: :: Test 2 ------ This test was impossible for awk (no gsub function). I will also admit that nawk here seemd to do something which I feel is more reasonable. BEGIN { s1 = "foobarbaz" gsub("^.", "X&Y", s1) print ":" s1 ":" s2 = "foo\nbar\nbaz" gsub("^.", "X&Y", s2) print ":" s2 ":" exit } nawk gawk mawk :XfYoobarbaz: :XfYXoYXoYXbYXaYXrYXbYXaYXzY: :XfYXoYXoYXbYXaYXrYXbYXaYXzY: :XfYoo :XfYXoYXoY :XfYXoYXoYX bar XbYXaYXrY YXbYXaYXrYX baz: XbYXaYXzY: YXbYXaYXzY: Test 3 ------ This test was impossible for awk (no match function). Mawk here seemed the best to me. Where nawk got its RLENGTH values from I can't understand. BEGIN { s1 = "foo" match(s1, /foo$/) print RSTART, RLENGTH print ":" substr(s1, RSTART, RLENGTH) ":" s2 = "foo\n\n" match(s2, /foo..$/) print RSTART, RLENGTH print ":" substr(s2, RSTART, RLENGTH) ":" exit } nawk gawk mawk 1 4 1 3 1 3 :foo: :foo: :foo: 1 6 0 -1 1 5 :foo :: :foo : : The very end Volume-Number: Volume 27, Number 26