home *** CD-ROM | disk | FTP | other *** search
- From: henry@zoo.toronto.edu (Henry Spencer)
-
- In article <487@usenix.ORG> karl@IMA.ISC.COM (Karl Heuer) writes:
- >In an environment where the digraph "ch" collates as a single element, what
- >happens if an attempt is made to match the subject string "chi" with the
- >pattern "[c[.ch.]]i" or "[c[.ch.]]hi"? Is the implementation required to
- >report a successful match in both cases? If so, it would seem necessary to
- >use a nondeterministic finite automaton or equivalent, thus making simple
- >regexp matching and filename globbing as complex as egrep pattern matching.
-
- Looking at draft 10, I don't think there is much doubt that they both must
- match, assuming those are legal regular expressions. (If "c" is not a
- collating element or noncollating character, they're not.) If both "c"
- and "ch" are valid collating elements, the bracket expression must be
- prepared to match either.
-
- The wording could stand improving.
-
- As for the implementation aspects, yes, this is a pain. However, there
- is basically no such thing as "simple" regexp matching. :-) The extra
- complexity added by multicharacter collating elements, while annoying,
- is not that big a deal. I think Karl is confused. *Any* non-trivial
- regexp matching ends up using either nondeterministic or deterministic
- automata, sometimes behind clever plastic disguises. The very simplest
- forms, like globbing, sometimes can get away without having to compile
- the regexp string into an internal form, by running a nondeterministic
- automaton directly from the regexp string. That will get a bit harder
- because of the greater complexity of 1003.2 regexps. However, there
- is no way that even "simple" regexp matching (I assume Karl is thinking
- of things like ed) is viable without a compilation step.
-
- Given that 1003.2 defines -- finally! -- library functions for regexp
- work of various kinds, including globbing, the complexity will, in any
- case, be localized to library functions. The programmer won't have to
- worry about it unless he's actually writing those library functions.
- (*That* won't be fun.)
- --
- TCP/IP: handling tomorrow's loads today| Henry Spencer at U of Toronto Zoology
- OSI: handling yesterday's loads someday| henry@zoo.toronto.edu utzoo!henry
-
- Volume-Number: Volume 21, Number 95
-
-