NetNews Usenet Archive 1992 #18

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #18 / NN_1992_18.iso / spool / comp / std / unix / 395 < prev next >

Wrap

Internet Message Format | 1992-08-18 | 4.0 KB

Path: sparky!uunet!uunet!not-for-mail From: gwyn@smoke.brl.mil (Doug Gwyn) Newsgroups: comp.std.unix Subject: Re: POSIX update Date: 19 Aug 1992 12:59:15 -0700 Organization: U.S. Army Ballistic Research Laboratory, APG, MD. Lines: 61 Sender: sef@ftp.UU.NET Approved: sef@ftp.uucp (Moderator, Sean Eric Fagan) Message-ID: <16u96jINNlej@ftp.UU.NET> References: <16b9drINNero@ftp.UU.NET> <16ccqfINNrss@ftp.UU.NET> <16h5a2INNko4@ftp.UU.NET> NNTP-Posting-Host: ftp.uu.net X-Submissions: std-unix@uunet.uu.net Submitted-by: gwyn@smoke.brl.mil (Doug Gwyn) In article <16h5a2INNko4@ftp.UU.NET> peterw@spaten.sharebase.com (Peter Wisnovsky) writes: >... my impression of what the potential problem is that if the standards >orgs went ahead and used wchar_t as the vehicle for supporting Unicode, >then existing conformant implementations that define wchar_t to be an >8-bit or 32-bit value would be made non-conformant. NT is not >non-conformant in itself...but the solution it proposes to the storage >of Unicode data, that is, the fixing of the size of wchar_t to be >16=bits wide, would not work with existing conformant systems. This represents a misunderstanding of the role of standards and of wchar_t in particular. The C standard requires that a conforming implementation define a wchar_t type for the internal representation of whatever multibyte character sequences it chooses to support; however, it does not mandate that any particular multibyte encoding be supported. A vendor may conform to the C standard while not supporting Unicode, for example. It is left as an implementation "marketing decision" just how extensive the multibyte support will be. Certainly, an implementation that chose to ignore genuine extended character sets and define wchar_t as an 8-bit type will not be able to at the same time support 16-bit Unicode. If and when such implementations are revised to properly support extended character sets, they will have to make wchar_t at least 16 bits, as most international implementations of Standard C already do. Since wchar_t is an internal data type it is not really relevant to issues of data interchange among different systems; that does not lie within the scope of the programming language standard. >Two proposals were discussed at the Unicode conference wrt XPG. One >would be to have the standards changed so that wchar_t would be >defined to be 16-bits wide; the other would be to create a new >datatype, `unichar'. The sentiment at the conference was to create a >new datatype. If that accurately reflects the discussion, then it merely serves to confirm the widespread impression that the Unicode proponents don't understand the existing standards nor the magnitude of the effects of addition of new data types to existing languages. It is already an easy matter to, for purposes of conformance with other standards, add non-conflicting requirements (such as at least 16 bits in wchar_t) beyond base standards. For example, POSIX.1 adds library requirements beyond those specified in the C standard. There is no need to "change" the base standards in such cases. Standard C's multibyte character support was designed in close consultation with many individuals and organizations who had long been involved in "internationalization" issues. ITSCJ particularly comes to mind, and they have continued to work on improvements within the C multibyte character model. Originally they too suggested a separate data type for "long" character encodings, and I proposed an alternate suggestion that also introduced a new data type (my type would have been used for sub-character bytes, however, reserving "char" for the sole character type, thus immensely simplifying programming). After considerable debate and several committee and working subgroup meetings, consensus was reached on the multibyte external sequence/wchar_t internal encoding approach. It would behoove the Unicode proponents to fully understand those deliberations and the resulting design before they further bollix up the works. Volume-Number: Volume 29, Number 7