home *** CD-ROM | disk | FTP | other *** search
- Submitted-by: jfh@rpp386.cactus.org (John F Haugh II)
-
- In article <1991Nov21.235529.9196@uunet.uu.net> gwyn@smoke.brl.mil (Doug Gwyn) writes:
- >No, to the contrary the existing regexp implementation was acultural;
- >you're referring to the idea that "[a-z]" for example ought to mean
- >"match any lowercase character in the current locale", but that is
- >NOT what it meant. It actually meant "match any byte having value
- >between the values I gave you around the dash-representation" (this
- >already was important to understand on machines that preferred
- >EBCDIC codesets, for example). You should keep in mind that you as
- >a user are inputting BITS into these patterns, some bytes of which
- >have special interpretation ([, ^, -, etc.) and others taken
- >literally as standing for their values. The ethocentricity was
- >introduced by 1003.2, presumably because people thought it would be
- >"nice" to be able to specify locale-dependent character classes; it
- >did not inhere in the previous regexp mechanism.
-
- I would say that POSIX completely ignored any codeset which was not
- 7-bit clean ASCII. The simple issue of 8-bit code points being
- mangled by ISTRIP is clear proof of this point. The definition of
- this function is in terms of bit widths, rather than character sizes.
- Any 8-bit code set (such as the European character sets or even EBCDIC)
- are mangled by the translation suggested by ISTRIP.
-
- I am certain that the various groups did give some thought to the
- issue, but it really is pretty obvious that 1003.1 completely ignored
- any system which uses 8 bit character sets.
-
- While 1003.1 was off inventing a new tty subsystem, it would have
- been nice if they invented an interface for setting any locale-specific
- traits of the tty system (a "tcsetlocale()" sort of deal) that would
- provide for translations of locale-specific characters (the variously
- accented vowels, for example) into something more POSIX-friendly.
- --
- John F. Haugh II |I am the NRA. | UUCP: ...!cs.utexas.edu!rpp386!jfh
- Ma Bell: (512) 255-8251 |Take a friend shooting.| Domain: jfh@rpp386.cactus.org
- " ... expectation is the mother of disappointment."
- -- Brad Konopik
-
- Volume-Number: Volume 26, Number 17
-
-