home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Usenet 1994 October
/
usenetsourcesnewsgroupsinfomagicoctober1994disk2.iso
/
std_unix
/
v13
/
8-bit
next >
Wrap
Internet Message Format
|
1989-01-07
|
46KB
From jsq Fri Apr 1 14:34:57 1988
Received: by uunet.UU.NET (5.54/1.14)
id AA09140; Fri, 1 Apr 88 14:34:57 EST
From: Ghie-Hugh Song <gs732%uxe.cso.uiuc.edu@uxc.cso.uiuc.edu>
Newsgroups: comp.std.unix
Subject: 8-Bit ASCII Standard on UNIX-POSIX
Message-Id: <10317@uunet.UU.NET>
Sender: std-unix@uunet.UU.NET
Reply-To: gs732%uxe.cso.uiuc.edu@uxc.cso.uiuc.edu (Ghie-Hugh Song )
Date: 29 Mar 88 03:06:55 GMT
Apparently-To: std-unix-archive
Status: O
From: gs732%uxe.cso.uiuc.edu@uxc.cso.uiuc.edu (Ghie-Hugh Song )
Hello, everyone,
Have you ever dreamed that TeX were more WYSWYG or that you could type
Greek characters in the text mode directly? If we had an extended
256 8-bit ASCII character set such as IBM PC's. (See Appendix of
PC DOS Manual), things would be much easier.
Then why not use WordPerfect or MS Word? First, all the Greek
characters and the math symbols are not supported by them, unless we
buy extra software and hardware. In IBM's extended ASCII, there is
no 'Greek tau', 'Greek nu', or inverted Greek capital delta symbol
for partial differential equations. Even the registered trade mark
sign 'R in a circle' does not exist. Then you might ask why not ChiWriter
or T-cube? Simply they are not portable! They are graphics programs.
They are not public-domain. One of them is really expensive.
So TeX has been thought to be a better choice for technical writers. But
it's not because it is easy to use but because the text is portable
and sometimes it is more versatile than other PC word processors. In fact,
without a laser printer or VorTeX and a graphics workstation, TeX is
not so useful as PC word processors. So something should be 1) ASCII
text files for portability and 2) easy to use. Then how about having
Greek characters and math symbols in the ASCII character set itself?
I've got an idea for all of us. And I wish to write a letter to
the ANSI people about a new 256 8-bit extended ASCII character standard.
But I don't know the ANSI's address. So if you agree with my idea,
please forward this message to ANSI with your opinion.
Let's have the Ext-key in our keyboard at the same location as the
'Alt' key in IBM's Enhanced keyboard. (I am using the term 'Ext' to
distinguish it from GNU-Emacs' meta-editing keys. However, the real name
of the latter half of this extended ASCII set should be
'meta', since they call it in that way in the termcap files.)
Ext-p (F0-hexdec) will give us a printable character Greek-pi,
and Ext-shift-p (D0-hexdec) will give us a printable character
Greek-Pi (captial-pi) directly. IBM's Greek-pi is in E3 in
hexadecimal which matches 'c' (63-Hexdec) among
128 7-bit ASCII codes. So every word processor is different in its way of
producing pi. It lowers the portability of word-processed texts.
At the end of this posting, I propose my draft. Please see and
examine it.
One may oppose this draft because the existing printers might not be used.
We can use those with a mere printer driver software with a translater
software as long as we do not type in the original text any one of the fonts
not supported by the printer.
I understand that the standardization of 8-bit extended ASCII
is too late. However I know that once this is implemented on
the new version of UNIX or POSIX, everyone will follow
this slowly. Now people are gathering to standardize UNIX, POSIX, SVID,
or whatever. Now is the time to express our opinion to ANSI people.
If we lose this chance we will never have a standard 8-bit ASCII.
If you agree with my idea, write a letter to ANSI, POSIX committee
(IEEE CS/P1003), and the acting System V.4 committee members of
AT&T-Sun-Unisys(?) immediately for their
prompt action. Unfortunately, I do not know any of those addresses.
I really do not know whether this effort is made first by me.
Nor do I know whether there exists such extended ASCII made by ANSI.
Since no text-mode terminal has inherent math fonts, I think there is no
such standard so far.
More than one half of college graduates in the world are either
engineers, scientist, or medical doctors. They need English with Greek
characters and math fonts to write reports, homeworks, papers. The need
sometimes exceeds that of their own language support. They need
a knowledge bank that can save some great idea like
E = h-bar.omega : Einstein's photoelectric effect,
E = mc^2 : Einstein's relativistic energy
without backslashes or $'s, and yet portable.
Thank you for your attention.
G. Hugh Song
Coordinated Science Lab.
Univ. of Illinois at Urbana-Champaign
1101 W. Springfield Av.
Urbana, IL 61801
song@uispg.csl.uiuc.edu
============================================================
Here is my draft of 256 new 8-bit ASCII character set. I place the
second half of 8-bit characters (128-255) next to the first half of them.
I am not decisive on what to assign to the following Ext-control keys (80-
hexdec to 9f-hexdec). There are two options:
1) We can assign new control keys which have become neccessary
as the computer science evolves. Some examples are shown below.
I wish that someone in the field rearrage the assignment and
complete this, since I do not have enough knowledge and current
implementation status of i/o utilization.
2) Or we may give some freedom to the manufacturers of keyboard
and terminals.
Even though these (00-hexdec to 1f-hexdec and 80-hexdec to 9f-hexdec)
are not legitimately printable while editing a text file, I wish there are
corresponding printable characters such as graphical framing characters
as in IBM PC or a triangle directing left for ^H, not just as the current
'^' which does not distinguish itself from 5e-hexdec. It will ease
debugging communication problems.
| 00 ^@ nul 80 sml decreases character size and increases back
01 ^a soh 81
02 ^b stx 82 bld boldifies and unboldifies (toggle)
03 ^c etx 83
04 ^d eot 84 dwn steps down one half line spacing
05 ^e enq 85
06 ^f ack 86
07 ^g bel 87 grp enters and exits graphics mode
| 08 ^h bs 88 hlp invokes help universally.
09 ^i ht 89 itl italicize and deitalicize from now
0a ^j nl 8a
0b ^k vt 8b mlm mouse left movement \
0c ^l np 8c mlb mouse left button |
0d ^m cr 8d mmb mouse middle button |
0e ^n so 8e mdm mouse downward movement | Important!
0f ^o si 8f | no matter what these are
| 10 ^p dle 90 mum mouse upward movement | meta-control keys or
11 ^q dc1 91 mrm mouse right movement | escape sequences.
12 ^r dc2 92 mrb mouse right button /
13 ^s dc3 93 scr sripticizes or unscripticizes (toggle)
14 ^t dc4 94
15 ^u nak 95 up steps up one half line spacing
16 ^v syn 96 rev reverses or reverses back characters's black and white
17 ^w etb 97
| 18 ^x can 98
19 ^y em 99
1a ^z sub 9a
1b ^[ esc 9b atn escapes during communication calling attention of
the local control
1c ^\ fs 9c
1d ^] gs 9d
1e ^^ rs 9e
1f ^_ us 9f
Now in the following we have printable characters except the 'DEL'
key at the end of the lower 7-bit codes. The alt key may be used to send
the 8-bit code to the host computer
by simulating this key with kermit's 'set key' program such as in
MSFERMIT version 2.30.
For the 7-bit terminal environment, in which 8-bit signals are not
generated or received by the terminal,
such as VT100, it is desirable for the C-shell or the editor to have a key
which tells the host computer that the next key is one of the upper
8-bit codes (128-255). This key should not contradict with a control key
of the existing editor programs. The 'esc' key might be thought the best
choice. However, most editor programs use this key heavily for some other
purposes. To avoid conflict, the 'cr (Cntrl-m)' key, which is redundant
both in vi and in gnuemacs (You might have noticed notice that 'C-m' is
changed to 'nl (C-j)' automatically by both editors), may be used.
This will limit the use of the Meta key in our (or Stallman's) GNU-Emacs.
This actually means no revision in GNU-Emacs. We just use the ESC key
to invoke the Meta editing keys, although the keyboard has the Meta key.
This is the price we pay
for those Greek characters and the math symbols. If we use the 'Cntrl-h' for
the real backspace, we have to choose another key for
invoking 'help' in GNU-Emacs. How about the 'Ext-Cntrl-h' ('88-hexdec')
(or 'C-m C-h' on the 7-bit terminal) as a key for invoking help
in the future version (Ver. 19) of GNU-Emacs. This is the only change
which is not compatible to the present version (Ver.18).
I'd like to suggest that the 'Ext-Cntrl-h (88-hexdec)' or 'Cntrl-m Cntrl-
h' on the 7-bit terminal be a new standard key invoking help in e
very software package in the future. Isn't it a good idea?
| 20 sp a0 a horizontal bar longer than just '-'.
21 ! a1 a black square
22 " a2 the starting double quotation mark
23 # a3 \neq : not-equal sign '/=' in one character site
24 $ a4 the Pound symbol (U.K. money unit)
25 % a5 \div : the division symbol, ':-' in one character site
26 & a6 \cap : the common set in set theory, The inverted 'U'.
27 ' a7 the starting single quatation mark
| 28 ( a8 the top portion of the left parenthesis
29 ) a9 the top portion of the left parenthesis
2a * aa a small circle at the ' level that usually represents degree
2b + ab \pm : '+-' in one character site with + up and - down.
2c , ac the cedilla symbol without c, s, or C.
2d - ad \mp : '-+' in one character site with - up and + down.
2e . ae \cdot : a dot at the center
2f / af $\dot $ : a dot at the top
| 30 0 b0 the bottom portion of the right parenthesis
31 1 b1 \propto : the proportionality symbol, 'oc' in one character site
32 2 b2 \bigcirc : a big circle.
33 3 b3 \prime \prime \prime : tripple-prime
34 4 b4 a vertical line with a wart in the middle as in '{'
35 5 b5 a vertical line with a wart in the middle as in '}'
36 6 b6 \partial : the mirror image of '6'
37 7 b7 \cup : the symbol in the set theory, that looks like 'U'
| 38 8 b8 \infty : the infinity symbol, 'oo' in one character site
39 9 b9 the bottom portion of the left parenthesis
3a : ba $\ddot $ : the umlaut, two dots overhead.
3b ; bb \prime \prime : the double-prime
3c < bc \le : '_<' in one character site
3d = bd \equiv : '=_' in one character site for the defining equality
3e > be \ge : '_>' in one character site
3f ? bf \supset : superset symbol
| 40 @ c0 the registered trademark sign, a small capital R in a circle
41 A c1 angstrom, a small circle on top of 'A'
42 B c2 \rightarrow : an arrow heading east
43 C c3 \copyright : a small capital 'C' in a circle
44 D c4 \Delta
45 E c5 \in : 'an element of' symbol in set theory
46 F c6 \Phi
47 G c7 \Gamma
| 48 H c8 \hbar : accented italic h for the Planck constant
49 I c9 the top portion of the integral symbol
4a J ca the bottom portion of the integral symbol
4b K cb \simeq :a set symbol (obtained from U by rotating it 90 deg CW)
4c L cc \Lambda
4d M cd \subset: symbol in the set theory
4e N ce \nabla : inverted Greek-capital-Delta
4f O cf \Omega
| 50 P d0 \Pi
51 Q d1 \Theta
52 R d2 \surd : also makes a \sqrt if combined with underlines (__)
53 S d3 \Sigma
54 T d4 the trade mark sign, the superscripted 'TM'
55 U d5 \Upsilon
56 V d6 \leftarrow : an arrow heading west
57 W d7 \ddag : the double dagger symbol used for a footnote.
| 58 X d8 \Xi
59 Y d9 \Psi
5a Z da \downarrow : an arrow heading south.
5b [ db \lceil : a vertical line whose top is clamped to the right
5c \ dc \times : 'x' without serif, math symbol for a multiplication
5d ] dd \rceil : a vertical line whose top is clamped to the left
5e ^ de $\check $ : an accent symbol inverted from '^'
5f _ df $\overline $ : a long bar on top.
| 60 ` e0 \prime (60-hexdec is a back-prime)
61 a e1 \alpha
62 b e2 \beta
63 c e3 \chi
64 d e4 \delta
65 e e5 \epsilon
66 f e6 \phi
67 g e7 \gamma
| 68 h e8 \eta
69 i e9 \iota
6a j ea \smallint : the integral symbol, enlongated s
6b k eb \kappa
6c l ec \lambda
6d m ed \mu
6e n ee \nu
6f o ef \omega
| 70 p f0 \pi
71 q f1 \theta
72 r f2 \rho
73 s f3 \sigma
74 t f4 \tau
75 u f5 a wiggle positioned at the underline(_) level.
76 v f6 \vec the short arrow symbol that represents a vector
77 w f7 $\dagger $ : the dagger symbol used for a Hermitian conjugate
| 78 x f8 \xi
79 y f9 \psi
7a z fa \zeta
7b { fb \lfloor : a vertical line whose bottom is clamped to the right
7c | fc \| : two vertical lines in one character site
7d } fd \rfloor : a vertical line whose bottom is clamped to the left
7e ~ fe \sim : a wiggle positioned at the center level
- - - - - - - - - - - - - - - - - - -
7f del ff erh erase the character at the current cursor position
-------------------------------------------------------------
These all can be reside in the text mode in 8-bit mode so that any text
mode terminal can display them directly on the text mode screen.
The possible benefit of this extension is:
1. If every typesetting program is revised according to the new standard,
they will become more WYSWYG. It means we do not need to type the '\alpha'
while typing a TeX file.
2. The wordprocessor and the typesetting programs will be cheaper since they
do not need to include soft-font files or the hard font ROM.
3. The word processor files can easily be exported and imported from one
word processor file to another without losing special characters as
long as they reside in 256 character set.
In addition to this new extended ASCII, I think that some of the
present ASCII characters should be redesigned from the present
ones as follows:
" 22 should be designed to look more like the closing double
quotation mark as in typeset books.
' 27 the closing single quotation mark or apostrophe
same comment as above (" 22-hexdec)
* 2a position this a little higher than the present height
so that it looks like a footnoting symbol, not like a multiplication
symbol.
/ 2f stretch this so that two of these can be connected without breaking
to make a long slanted line.
\ 5c the same comment as above
_ 5f the same comment as above so that it should be \underbar{ }
| 7c make this a single long vertical line rather than the present
one broken at the middle.
The current ANSI standard for erasing the previous character is DEL,
not backspace! Let us encourage everyone to observe this standard.
I know that the troublemaker IBM does not follow this standard.
Let them go their way. We do not care for IBM. We are talking about UNIX
and GNU-Emacs and TeX. Then backspace will do the following job in
GNU-Emacs and vi.
^h 08 bs a backspace key without erasing the previously typed
character, making an overprinted image when printed. I think this
special function of this key is actually in the present
ANSI standard. You might have noticed that the UNIX 'man'ual
pages contain '^H' in their text files for underlining.
It seems now fully supported by most ANSI terminals. (But not on
IBM's) Nevertheless, it is not supported by vi or GNU-Emacs.
Let's encourage Mr.Stallman to support this in his new
version of GNU-Emacs. It will display every accented
vowel for foreign alphabets, the
cent (money unit), some foreign money units, the C-cedilla
('Ext-,-backspace-c'), and the null set symbol ('0/' in one
character site.
To edit this backspace we need a special character for
this. A hollow triangle dirrected to the left is good
enough. Also (Emacs maybe not on vi) will have mode to view
this while editing.
^m 0d met In due consideration, the mnemonic should be changed from
'cr' to 'met'a.
==========================================
KEYBORAD
-----------------
This part is not part of my proposal. I just wish that the new ANSI
ASCII keyboard has the following keys. One may assign some
function keys for the following purposes. But it goes
without saying that separate keys at the space bar level are more desirable.
For text/graphics terminals
Italic key : italicizes the normal character. this key should be active
only on the alpabetic characters, Greek capital characters,
but not on numeric characters, symbols like '%', '+', '"',etc.
On a black-and-white text-mode-only terminal which does not have
ROM to support various fonts (such as VT100),
it would be desirable if this key reverses white and black
of those characters between the two italic keys.
Black becomes white, white becomes black. (Toggle)
Bold key : boldens or highlights a character. (Toggle)
For graphics terminals
Step-up key : moves the position 1/2-line higher. and then step down key
to go back to the original line height.
Step-down key : moves the position 1/2-line lower. and then step-up key
to go back.
Script key : displays the scripted characters. (Toggle)
Small character key : displays small characters from now and restores the
size back. (Toggle)
As to the Keyboard Layout,
We do not need to have the editing keypad on the right.
Why don't we move it to the left leaving more space for the mouse?
=====================================End of draft=======
P.S. At first, I did not intend to do this as a project. However
it turned out to be a big project. Now I want to drop
this project and let this free to the public by posting at
the news system here. I hope everybody to express their
opinion and fruitful discussion here. And fianlly I hope to see ANSI
or POSIX committee act.
Please start this project and act, ANSI.
Volume-Number: Volume 13, Number 39
From uucp Tue Apr 5 03:28:35 1988
Received: by uunet.UU.NET (5.54/1.14)
id AA05757; Tue, 5 Apr 88 03:28:35 EDT
From: Dave Sill <dsill@NSWC-OAS.ARPA>
Newsgroups: comp.std.unix
Subject: Re: 8-Bit ASCII Standard on UNIX-POSIX
Message-Id: <155@longway.TIC.COM>
Sender: std-unix@longway.TIC.COM
Reply-To: Dave Sill <dsill@NSWC-OAS.ARPA>
Date: 4 Apr 88 20:21:29 GMT
Apparently-To: std-unix-archive
Status: O
From: Dave Sill <dsill@NSWC-OAS.ARPA>
In article <10317@uunet.UU.NET> Ghie-Hugh Song <gs732%uxe.cso.uiuc.edu@uxc.cso.uiuc.edu> writes:
>Have you ever dreamed that TeX were more WYSWYG or that you could type
>Greek characters in the text mode directly? If we had an extended
>256 8-bit ASCII character set such as IBM PC's. (See Appendix of
>PC DOS Manual), things would be much easier.
>
> [Proposed 8-bit ASCII deleted.]
There was a time when I would have supported such a proposal. I'd
particularly like to have such extended characters for use in
programming languages (a left-arrow character for assignment, for
example) and in shells (to take the place of metacharacters such as:
<>*&| et cetera that also have normal meanings as punctuation.)
However, I can't see that any fixed 8-bit or 9-bit or even 16-bit code
will be able to meet future requirements. It would only be a matter
of time before people started to complain about characters not in the
standard set.
I think a more powerful and flexible system based on the current 7-bit
ASCII would be better. Bill Joy, in the April issue of Unix Review,
says PostScript is the new ASCII. I'm inclined to agree that
a PostScript like language would fill the bill better over the long
run (next 10-20 years) than a fixed code. This would also allow the
internationalization effort to be integrated nicely. Do any
PostScript/TeX/Metafont weenies care to argue the merits of their
favorite system?
Input and output devices capable of handling such extended character
sets are another problem. I don't think ctrl/meta/alt/cokebottle keys
on qwerty keyboards are the best way to go, but neither are keyboards
with hundreds of keys. It would probably be better to have smart
keyboards and drivers that would recognize escape sequences and
replace them with associated extended characters.
Output devices, in general, are more capable. Except for daisy-wheel
printers and dumb terminals, most have graphics capabilities that
could be put to use. For the ever-important backward compatibility,
though, some scheme would have to be devised. Perhaps a pseudo-
PostScript for ASCII-only devices could be devised that would display
everything in one font and get the spacing as close as possible to
what was intended.
Oh well, I'm rambling...
=========
The opinions expressed above are mine.
"I no longer think of something as a computer unless
it's connected to a network."
-- Peter Weinberger
Volume-Number: Volume 13, Number 41
From uucp Tue Apr 5 03:29:49 1988
Received: by uunet.UU.NET (5.54/1.14)
id AA05837; Tue, 5 Apr 88 03:29:49 EDT
From: Guy Harris <guy@Sun.COM>
Newsgroups: comp.std.unix
Subject: 8-Bit ASCII Standard on UNIX-POSIX
Message-Id: <156@longway.TIC.COM>
Sender: std-unix@longway.TIC.COM
Reply-To: guy@Sun.COM (Guy Harris)
Date: 4 Apr 88 21:11:13 GMT
Apparently-To: std-unix-archive
Status: O
From: guy@Sun.COM (Guy Harris)
> I understand that the standardization of 8-bit extended ASCII
> is too late.
You're probably right. There is already a family of ISO extended character
sets, the ISO 8859 family, that are extensions of ASCII and are being adopted
by many UNIX systems. The ISO 8859/1 character set, also known as "ISO Latin
Alphabet #1", has been adopted or will be adopted by a number of vendors for
Western European use; AT&T has adopted t, Sun plans to do so, I believe Apollo
has done so, DEC's international character set is similar to it and they may
have adopted it, and I think X/Open has already specified it.
Volume-Number: Volume 13, Number 42
From uucp Wed Apr 6 13:36:00 1988
Received: by uunet.UU.NET (5.54/1.14)
id AA08155; Wed, 6 Apr 88 13:36:00 EDT
From: std-unix@longway.TIC.COM (Moderator, John S. Quarterman)
Newsgroups: comp.std.unix
Subject: Re: 8-Bit ASCII Standard on UNIX-POSIX
Message-Id: <158@longway.TIC.COM>
References: <10317@uunet.UU.NET>
Reply-To: uunet!harvard.harvard.edu!haddock!karl (Karl Heuer)
Organization: Interactive Systems, Boston
Date: 5 Apr 88 15:47:27 GMT
Apparently-To: std-unix-archive
Status: O
From: uunet!harvard.harvard.edu!haddock!karl (Karl Heuer)
In article <10317@uunet.UU.NET> gs732%uxe.cso.uiuc.edu@uxc.cso.uiuc.edu (Ghie-Hugh Song ) writes:
>Have you ever dreamed that TeX were more WYSWYG or that you could type
>Greek characters in the text mode directly? If we had an extended
>256 8-bit ASCII character set such as IBM PC's. (See Appendix of
>PC DOS Manual), things would be much easier.
Well, actually I've wished for a *lot* of non-ASCII characters at various
times. More than you can fit in the 128 available slots. But most of them
are so seldom used that I don't mind that they don't have reserved 8-bit
values.
>For the 7-bit terminal environment, in which 8-bit signals are not generated
>or received by the terminal, such as VT100, it is desirable for the C-shell
>or the editor to have a key which tells the host computer that the next key
>is one of the upper 8-bit codes (128-255). This key should not contradict
>with a control key of the existing editor programs.
There is no such key. (Yes, Emacs *does* distinguish between C-m and C-j.
Besides, on most keyboards the big key labeled RETURN or ENTER generates C-m,
so if you preempt that for a pseudo-meta, you'd have to use an explicit C-j
(awkward to type) to get a newline.) Not that it matters -- such editors
normally run in raw mode anyway, so they'd be bypassing the new feature.
>In addition to this new extended ASCII, I think that some of the present
>ASCII characters should be redesigned from the present ones as follows:
>[suggests, among other changes, that /\_| should be stretched to fit the
>character cell]
If you want line-drawing characters, add a line-drawing font. Don't try to
make the ASCII set do double duty.
>... You might have noticed that the UNIX 'man'ual pages contain '^H' in their
>text files for underlining. It seems now fully supported by most ANSI
>terminals.
Oh? Underlining with backspace is not unheard of, but I think the escape
sequence \e[4m is more common, especially among "ANSI terminals". Perhaps
you're confused by software that does this conversion for you (e.g. "more")?
And certainly very few terminals (hardcopy excepted) will display general
overstrikes like a cent sign.
Volume-Number: Volume 13, Number 41
From uucp Thu Apr 7 21:29:39 1988
Received: by uunet.UU.NET (5.54/1.14)
id AA10260; Thu, 7 Apr 88 21:29:39 EDT
From: (rja <rja@edison.GE.COM>
Newsgroups: comp.std.unix
Subject: Re: 8-Bit ASCII Standard on UNIX-POSIX
Message-Id: <159@longway.TIC.COM>
References: <10317@uunet.UU.NET> <158@longway.TIC.COM>
Sender: std-unix@longway.TIC.COM
Reply-To: rja@edison.GE.COM (rja)
Organization: GE-Fanuc North America
Date: 7 Apr 88 13:01:40 GMT
Apparently-To: std-unix-archive
Status: O
From: rja@edison.GE.COM (rja)
I'd thought someone else would have pointed this out, but nothing has
shown up at edison to correct the misconception. THERE IS AN 8-bit
STANDARD, ISO 8859/1. It also has other variants (8859/2, etc.) which
support other languages. The 8859/1 version supports nearly ALL languages
in use in Western Europe. It is a proper superset of ASCII.
[ Actually, Guy Harris already pointed this out. But your following
observation is quite useful. -mod ]
For more details, please tune into the ongoing discussions in
comp.std.internat, where several people have voiced concerns over
languages NOT supported. Nevertheless, since it is being adopted as an
X/OPEN standard and big companies like AT&T are adopting it, expect it
to take hold.
Volume-Number: Volume 13, Number 47
From uucp Thu Apr 7 22:31:59 1988
Received: by uunet.UU.NET (5.54/1.14)
id AA15921; Thu, 7 Apr 88 22:31:59 EDT
From: Mike Threepoint <linhart@topaz.rutgers.edu>
Newsgroups: comp.std.unix
Subject: Re: 8-Bit ASCII Standard on UNIX-POSIX
Message-Id: <160@longway.TIC.COM>
References: <156@longway.TIC.COM>
Sender: std-unix@longway.TIC.COM
Reply-To: linhart@topaz.rutgers.edu (Mike Threepoint)
Organization: The Society for Creative Euthanasia
Date: 7 Apr 88 11:12:59 GMT
Apparently-To: std-unix-archive
Status: O
From: linhart@topaz.rutgers.edu (Mike Threepoint)
Bo Thide (irf@kuling) recently described it [ISO 8859/1 -mod] as 191
characters cleverly designed with capitals coded as shifted miniscules,
including eth (which I'm not sure what it is), thorn, and sharp S.
To possibly add to the list, this sounds like the character set
Microsoft Windows uses and terms (by no standard I know of) "ANSI".
It has the vowels in acute, grave, circumflex, tilde, and umlaut.
The high bit characters also include cent, pound, yen, and universal
currency symbols, circle-R trademark and circle-C copyright symbols,
inverted ? and !, section and paragraph symbols, << guillemets >>,
several accents, 1/4, 1/2, and 3/4 characters, and superscripted 1, 2,
and 3. The last sound like a bad idea to me, so I actually hope this
is something they threw together themselves.
Sound like ISO 8859? If not, I would be quite interested to know just
what it is. How much do I send to where (if you can't just mail me a
copy)?
What I would also like to see is the ASCII 0..1F (31 dec.) graphic
representations on new machines conform to the ANSI standard. They
might look impractical, but after setting up a font using them on my
micro, it's amazing how much sense they make to me.
--
"Science does not remove the terror of the gods." | Mike Threepoint
-- J.R. "Bob" Dobbs | linhart@topaz.rutgers.edu
"One man's theology is another man's belly laugh." | FidoNet 1:107/513
-- Lazarus Long | AT&T (201)878-0937
Volume-Number: Volume 13, Number 48
From uucp Fri Apr 8 13:29:00 1988
Received: by uunet.UU.NET (5.54/1.14)
id AA09236; Fri, 8 Apr 88 13:29:00 EDT
From: Guy Harris <guy@Sun.COM>
Newsgroups: comp.std.unix
Subject: Re: 8-Bit ASCII Standard on UNIX-POSIX
Message-Id: <161@longway.TIC.COM>
Sender: std-unix@longway.TIC.COM
Reply-To: guy@Sun.COM (Guy Harris)
Date: 8 Apr 88 05:38:39 GMT
Apparently-To: std-unix-archive
Status: O
From: guy@Sun.COM (Guy Harris)
> To possibly add to the list, this sounds like the character set
> Microsoft Windows uses and terms (by no standard I know of) "ANSI".
> It has the vowels in acute, grave, circumflex, tilde, and umlaut.
> The high bit characters also include cent, pound, yen, and universal
> currency symbols, circle-R trademark and circle-C copyright symbols,
> inverted ? and !, section and paragraph symbols, << guillemets >>,
> several accents, 1/4, 1/2, and 3/4 characters, and superscripted 1, 2,
> and 3. The last sound like a bad idea to me, so I actually hope this
> is something they threw together themselves.
> Sound like ISO 8859?
Yes. The superscripted letters *do* come from ISO 8859 (see below).
> What I would also like to see is the ASCII 0..1F (31 dec.) graphic
> representations on new machines conform to the ANSI standard. They
> might look impractical, but after setting up a font using them on my
> micro, it's amazing how much sense they make to me.
What "graphic representations" are you referring to? The only ANSI standard I
know of for characters in the range 0x00 to 0x1f is ASCII, which says they're
*control* characters, not *printable* characters.
For your collective amusement, here is a chart of ISO 8859/1 or "ISO Latin
Alphabet #1". This was derived by some quick hacking on the X11 include file
"keysymdef.h" - yes, X11 uses the ISO character sets as well.
non-breaking space 0xa0
inverted exclamation point 0xa1
cent sign 0xa2
pounds sterling 0xa3
"currency symbol" 0xa4
yen 0xa5
broken bar 0xa6
section mark 0xa7
diaeresis 0xa8
copyright 0xa9
feminine ordinal 0xaa
(this is a subscripted lower-case "a", underlined)
left guillemot 0xab
(French left quote, looks like small "<<")
not sign 0xac
hyphen 0xad
registered trademark 0xae
macron 0xaf
(an elevated small horizontal bar)
degree symbol 0xb0
plus/minus 0xb1
superscript 2 0xb2
superscript 3 0xb3
acute accent 0xb4
mu 0xb5
paragraph symbol 0xb6
small centered dot 0xb7
cedilla 0xb8
superscript 1 0xb9
masculine ordinal 0xba
(this is a subscripted lower-case "o", underlined)
right guillemot 0xbb
(French right quote, looks like small ">>")
1/4 0xbc
1/2 0xbd
3/4 0xbe
inverted question mark 0xbf
A with grave accent 0xc0
A with acute accent 0xc1
A with circumflex accent 0xc2
A with tilde 0xc3
A with diaeresis 0xc4
A with ring 0xc5
(as in "Angstrom")
AE dipthong 0xc6
C with cedilla 0xc7
E with grave accent 0xc8
E with acute accent 0xc9
E with circumflex accent 0xca
E with diaeresis 0xcb
I with grave accent 0xcc
I with acute accent 0xcd
I with circumflex accent 0xce
I with diaeresis 0xcf
upper-case eth 0xd0
(eth is an Icelandic letter)
N with tilde 0xd1
O with grave accent 0xd2
O with acute accent 0xd3
O with circumflex accent 0xd4
O with tilde 0xd5
O with diaeresis 0xd6
multiply sign 0xd7
O with slash 0xd8
U with grave accent 0xd9
U with acute accent 0xda
U with circumflex accent 0xdb
U with diaeresis 0xdc
Y with acute accent 0xdd
upper-case thorn 0xde
(thorn is an Icelandic letter)
German double-s 0xdf
a with grave accent 0xe0
a with acute accent 0xe1
a with circumflex accent 0xe2
a with tilde 0xe3
a with diaeresis 0xe4
a with ring 0xe5
(lower-case "A with ring")
ae dipthong 0xe6
c with cedilla 0xe7
e with grave accent 0xe8
e with acute accent 0xe9
e with circumflex accent 0xea
e with diaeresis 0xeb
i with grave accent 0xec
i with acute accent 0xed
i with circumflex accent 0xee
i with diaeresis 0xef
lower-case eth 0xf0
n with tilde 0xf1
o with grave accent 0xf2
o with acute accent 0xf3
o with circumflex accent 0xf4
o with tilde 0xf5
o with diaeresis 0xf6
division sign 0xf7
o with slash 0xf8
u with grave accent 0xf9
u with acute accent 0xfa
u with circumflex accent 0xfb
u with diaeresis 0xfc
y with acute accent 0xfd
lower-case thorn 0xfe
y with diaeresis 0xff
Volume-Number: Volume 13, Number 49
From uucp Sat Apr 9 19:58:06 1988
Received: by uunet.UU.NET (5.54/1.14)
id AA04772; Sat, 9 Apr 88 19:58:06 EDT
From: std-unix@longway.TIC.COM (Moderator, John S. Quarterman)
Newsgroups: comp.std.unix
Subject: Re: 8-Bit ASCII Standard on UNIX-POSIX
Summary: ISO 6937
Message-Id: <162@longway.TIC.COM>
References: <161@longway.TIC.COM>
Reply-To: uunet!rutgers.edu!mtune!homxb!hrs (H.SILBIGER)
Organization: AT&T Bell Laboratories, Holmdel
Date: 9 Apr 88 13:57:53 GMT
Apparently-To: std-unix-archive
Status: O
From: uunet!rutgers.edu!mtune!homxb!hrs (H.SILBIGER)
In article <161@longway.TIC.COM>, guy@Sun.COM (Guy Harris) writes:
> From: guy@Sun.COM (Guy Harris)
>
> > currency symbols, circle-R trademark and circle-C copyright symbols,
> > inverted ? and !, section and paragraph symbols, << guillemets >>,
> > and 3. The last sound like a bad idea to me, so I actually hope this
> > Sound like ISO 8859?
>
> Yes. The superscripted letters *do* come from ISO 8859 (see below).
>
>
There is another ISO standard that handles all latin alphabets, known as
ISO6937. There is a CCITT equivalent.
This character set is characteristically used in text communication
applications, such as document architecture, teletex, message handling, etc.
ISO 8859 is used mainly in the computer processing environment.
[ Because ISO 6937 buys extreme flexibility by composing characters as
two-byte combinations of basic character and accent, while ISO 8859
encodes every character as one byte. I saw this on comp.std.internat,
which I recommend everybody interested in this discussion should read. -mod ]
Herman Silbiger batavier!hrs@ATT.COM
Volume-Number: Volume 13, Number 50
From uucp Thu Apr 14 05:18:56 1988
Received: by uunet.UU.NET (5.54/1.14)
id AA24658; Thu, 14 Apr 88 05:18:56 EDT
From: Peter da Silva <peter%sugar.UUCP@uunet.UU.NET>
Newsgroups: comp.std.unix
Subject: Re: 8-Bit ASCII Standard on UNIX-POSIX
Message-Id: <164@longway.TIC.COM>
References: <156@longway.TIC.COM>
Sender: std-unix@longway.TIC.COM
Reply-To: sugar!peter (Peter da Silva)
Organization: Sugar Land UNIX - Houston, TX
Date: 14 Apr 88 00:54:17 GMT
Apparently-To: std-unix-archive
Status: O
From: peter@sugar.UUCP (Peter da Silva)
> From: guy@Sun.COM (Guy Harris)
> The ISO 8859/1 character set, also known as "ISO Latin
> Alphabet #1", has been adopted or will be adopted by a number of vendors for
> Western European use; AT&T ... Sun ... Apollo ... DEC ...
Add the Commodore Amiga and (I think) the Atari ST to the list...
--
-- Peter da Silva `-_-' ...!hoptoad!academ!uhnix1!sugar!peter
-- "Have you hugged your U wolf today?" ...!bellcore!tness1!sugar!peter
-- Disclaimer: These aren't mere opinions, these are *values*.
Volume-Number: Volume 13, Number 51
From uucp Fri Apr 15 13:46:37 1988
Received: by uunet.UU.NET (5.54/1.14)
id AA08154; Fri, 15 Apr 88 13:46:37 EDT
From: std-unix@longway.TIC.COM (Moderator, John S. Quarterman)
Newsgroups: comp.std.unix
Subject: Re: 8-Bit ASCII Standard on UNIX-POSIX
Summary: ANSI X3.32 "Graphic Representation of the Control Characters of ASCII"
Keywords: Yale, Master...
Message-Id: <165@longway.TIC.COM>
References: <161@longway.TIC.COM>
Reply-To: uunet!topaz.rutgers.edu!linhart (Mike Threepoint)
Organization: The Society for Creative Euthanasia
Date: 15 Apr 88 01:40:47 GMT
Apparently-To: std-unix-archive
Status: O
From: uunet!topaz.rutgers.edu!linhart (Mike Threepoint)
u <- guy@Sun.COM (Guy Harris)
When we last left our conversation:
[description of MS Windows "ANSI" character set]
me> Sound like ISO 8859?
u> Yes. The superscripted letters *do* come from ISO 8859 (see below).
Thanks to your table, I can confirm that they match. Multiply and
divide symbols are missing, replaced by the ubiquitous empty box, but
now my EGA font is complete (for my IBM compatible, made with CHET, I
could post it to binaries if anyone really wanted it).
me> What I would also like to see is the ASCII 0..1F (31 dec.) graphic
me> representations on new machines conform to the ANSI standard. They
me> might look impractical, but after setting up a font using them on my
me> micro, it's amazing how much sense they make to me.
u> What "graphic representations" are you referring to? The only ANSI standard
u> I know of for characters in the range 0x00 to 0x1f is ASCII, which says
u> they're *control* characters, not *printable* characters.
The rather obscure ANSI X3.32-1973 "Graphic Representation of the
Control Characters of ASCII" defines them for use when the
name-in-tiny-letters isn't used. I found it in Joe Cambell's "C
Programmers Guide to Serial Communications" (Howard W. Sams & Co.)
[wish I had a copy of my own, had to reborrow it to get the ID] which
contains an ASCII chart poster with those symbols on it.
It goes something like this: (they're in that aforementioned font, too)
^@ NUL a hollow square, like the one used in Mac character sets for
undefined characters
^A SOH the left column and top row of the cell set, like an
inverted L
^B STX the bottom row and center column set, like a
perpendicular symbol
^C ETX the right column and bottom row set, like a reversed L
^D EOT a single zig-zag, like a lightning bolt
^E ENQ a square with an X in it
^F ACK a check mark (tick mark to you Europeans)
^G BEL a hemisphere with two L shaped feet, I want to say
"doodlebug", but it's probably more like an
electronic component
^H BS an up arrow bent over leftwards into a U-shaped hook
at the top
^I HT a right arrow with the barbs extended to the length of
the shaft, more like a dart
^J LF three parallel horizontal lines
^K VT a downward pointing dart [Campbell says that instead
of overloading LF with NewLine worsening the present
incompatibilites, ANSI should have redefined this
almost totally unused character]
^L FF a down dart with a second arrowhead midway down its
shaft
^M CR a left dart
^N SO a circle with an X in it
^O SI a circle with a dot in the center
^P DLE a square with a horizontal line through the middle
^Q DC1 a circle with the top right quarter sectioned off,
that is, lines from the center to the top and right
^R DC2 same, but lower right quarter
^S DC3 same, but lower left quarter
^T DC4 same, but top left quarter
^U NAK a check (tick) mark with a horizontal line thru the
center
^V SYN a rectangle with the bottom cut in half and turned
outward, like a bottomless rectangle with feet
^W ETB the right column and center row set, a T on its side
^X CAN a down pointing hollow triangle on an up pointing one,
like an hourglass
^Y EM a vertical line with a fat dot in the middle
^Z SUB a backwards ?
^[ ESC a circle with a line through the center
^\ FS a square box with the top left quarter sectioned off
^] GS same, but bottom left
^^ RS same, but bottom right
^_ US same, but top right
u> For your collective amusement, here is a chart of ISO 8859/1 or "ISO Latin
u> Alphabet #1". This was derived by some quick hacking on the X11 include
u> file "keysymdef.h" - yes, X11 uses the ISO character sets as well.
[table deleted]
Thanx, I appreciate it. I'm still interested in 8859-3 (which I read
supports Esperanto), if it's not too much trouble could you tell me
a) what _its_ layout is, or b) how much to send to where?
--
"...billions and billions..." | Mike Threepoint (D-ro 3)
-- not Carl Sagan | linhart@topaz.rutgers.edu
"...hundreds if not thousands..." | FidoNet 1:107/513
-- Pnews | AT&T +1 (201)878-0937
Volume-Number: Volume 13, Number 52
From uucp Fri Apr 15 14:26:37 1988
Received: by uunet.UU.NET (5.54/1.14)
id AA09275; Fri, 15 Apr 88 14:26:37 EDT
From: David Brooks <BROOKS@CSSS-A.PRIME.COM>
Newsgroups: comp.std.unix
Subject: Re: 8-bit discussion in std-unix
Message-Id: <167@longway.TIC.COM>
Sender: std-unix@longway.TIC.COM
Reply-To: BROOKS@CSSS-A.PRIME.COM (David Brooks)
Date: 13 Apr 88 16:22:44 GMT
Apparently-To: std-unix-archive
Status: O
From: BROOKS@CSSS-A.PRIME.COM (David Brooks)
(this may be a duplicate. I don't trust our mailers to figure out
longway.UUCP...)
[ Judging by the non-standard headers your mailer sent, it's no wonder....
See the next article for std-unix/comp.std.unix posting information. -mod ]
Maybe you can get a comment from the ANSI committee (X3L2). It labored
for some years to produce the 8-bit set, which was then accepted lock,
stock and barrel by ISO for 8859/1. ECMA was an intermediary.
I was following the ANSI deliberations from a distance; as a result the
Prime PT200 was, I think, the first terminal to implement the set. The
set is something of a compromise. The ugly inclusion of multiply and
divide amongst the "o"s is particularly weird; originally these
contained the OE diphthongs in deference to the French. But, to
everyone's surprise, the French didn't want them!
Nobody, but nobody, knows how to design eth and thorn. If any
Icelanders(?) would post a bitmap AND a PostScript definition of these
four glyphs, many of us would be grateful.
And there IS a standard way of invoking 8-bit characters in a 7-bit
environment: use Shift-out and Shift-in. These are control-N and
control-O respectively; they can't be typed directly in EMACS and would
confuse any software that assumes <one byte> = <one character position>,
but receiving terminals should do the right thing.
Assigning graphics to control characters is highly non-standard.
[ As someone just pointed out, apparently there is an obscure standard. -mod ]
IBM and Mac assign graphics to the control-1 set (hex 80 to 9F). Actually
I wish someone would properly implement the control-1 characters, like
reverse linefeed, halfline up and down, CSI...
__ __
/ ) / / ) /
/ / __ _ o __/ /--/ __ ________/_) _
/__/ (_(_|/ (__(_( /__/ / (_/_/ /_/ / \_/_)_
Internet: BROOKS@CSSS-A.PRIME.COM
uucp: {mit-eddie,necntc}!primerd!csss-a.prime.com!brooks
Standard disclaimer applies as appropriate.
Volume-Number: Volume 13, Number 54
From uucp Sun Apr 17 13:49:25 1988
Received: by uunet.UU.NET (5.54/1.14)
id AA07114; Sun, 17 Apr 88 13:49:25 EDT
From: (rja <rja@edison.GE.COM>
Newsgroups: comp.std.unix
Subject: Eth and Thorn characters
Keywords: Icelandic, Vietnamese, ISO8859/1
Message-Id: <170@longway.TIC.COM>
Sender: std-unix@longway.TIC.COM
Reply-To: rja@edison.GE.COM (rja)
Organization: GE-Fanuc North America
Date: 16 Apr 88 18:22:53 GMT
Apparently-To: std-unix-archive
Status: O
From: rja@edison.GE.COM (rja)
There have been a flurry of postings (primarily in comp.std.unix)
about ISO 8859/1 (8 bit character set for western European languages).
A recent posting has inquired about what the Eth and Thorn characters
look like.
Eth is similar in appearance to the letter D. However, eth has an additional
horizontal line about halfway vertically up on the left-hand vertical stroke.
This horizontal line extends to the left of where a D would stop and goes
halfway between the curved and vertical strokes of the D. The horizontal
stroke is symmetric with respect to the vertical left-hand line of a D.
Lowercase eth follows the same pattern with respect to the d, except that
the horixontal line is about 3/4 up the vertical line of a "d". The line
should be halfway between the top of the vertical line and the place where
the round part of the "d" meets the straight part (upper connection).
Thorn upper and lower case follows this same pattern except that replace all
instances of "d" and "D" above with "p" and "P". The lowercase thorn has
its horizontal line on the stem as part of the descender and the upper case
thorn has its horizontal line on the stem rather than between the intersections
of the straight and vertical lines forming the top of the "P".
While these characters are originally Icelandic/Norse, the Eth characters are
also used in Vietnamese (Quoc Ngu). Vietnamese is normally written using a
Roman-style script that has an amasing number of diacritical marks, so it isn't
quite handled by ISO 8859/1. ISO 8859/1 does come close though....
I haven't any reasonable way to generate a bit-map or Postscript image, but
these descriptions should get the general idea across to the folks at Prime,
DEC, etc. so they can implement them.
I'd be interested in getting mail from anyone who knows if a standard character
set exists for Vietnamese.
______________________________________________________________________________
rja@edison.GE.COM or ...uunet!virginia!edison!rja
"Noalias must go, this is non-negotiable." DMR
______________________________________________________________________________
Volume-Number: Volume 13, Number 56