home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!news.claremont.edu!ucivax!news.service.uci.edu!unogate!mvb.saic.com!info-tex
- From: mackay@cs.washington.edu (Pierre MacKay)
- Newsgroups: comp.text.tex
- Subject: Inputting accented characters (postpositive accents)
- Message-ID: <9301211717.AA18234@june.cs.washington.edu>
- Date: Thu, 21 Jan 93 09:17:14 -0800
- Organization: Info-Tex<==>Comp.Text.Tex Gateway
- X-Gateway-Source-Info: Mailing List
- Lines: 70
-
- In principal, this is an excellent idea, and I have pushed for it since
- way back in the late 60s, but it is not without its difficulties.
-
- Knuth's accent primitive was never intended for large amounts of
- running text, as he makes clear in several places where he
- comments on it. It was to provide the necessary low-level
- solution to the immediate problem (fairly acute for a mathematician)
- of citations from European literature. He never contemplated
- keyboarding {\it Les Miserables} with the accent primitive.
-
- When you propose a postpositive convention, however, you have to make a
- distinction between keystrokes and input coding. The editor you or
- your clients use may very well manage the handling of keystrokes in
- the way you intend but when you send those keystrokes to a program as
- codes, you may get unexpected results. How do you distinguish
- ``Souse'' (in quotes) from ``Souse'', with (sic) "accent grave
- over the E" (W. C. Fields, in {\it The Bank Dick}). The easy answer
- (too easy) is to say that the French use << >> and not `` '', but
- the French are not the only users of e-acute. UNICODE has moved in the
- direction of postpositive accent conventions, and when all or most of us have
- 16-bit character software I think the decision will be seen as very much
- the correct one, but we have to live with 8-bit.
-
- About the only way to isolate a sufficient number of accents so they can
- be used in a low-level "glass tty" environment as well as on special
- and usually proprietary systems, is to provide an intermediate code.
- I use the +, because the mathematical symbol, when needed, can and should
- always be set in math mode ($+$). But no one wants to type e+' for every
- occurrence of e-acute, nor should they. The trigraph convention is there
- as a backup, in case you have to edit a text from a plain ascii terminal
- running nothing but communications software (as I am right now). The editor
- program used by a multilingual author ought to provide a lot of preaccented
- glyphs tied to unique eight-bit (ultimately 16-bit) codes, and the translation
- to trigraphs ought to be invisible to the user 99 44/100% of the time.
- We do that with both Greek and Turkish. The PC-Write Turkish editor produces
- many codes in the 0x80-0xFF range, and those are invisibly translated
- by the TeX macro file to digraphs and trigraphs in a post-positive
- accent convention. The digraph convention is there so that files can
- be sent through the mail, and so that last minute editing can be done
- on displays that know nothing of the PC-Write Turkish convention.
-
- Of course, there have to be fonts with ligature tables that interpret
- digraphs and trigraphs and produce accented TeX characters. That assumes
- the existence of a font with composite accented characters, such as
- the DC (Cork) encoded fonts being developed at the present time.
-
- If you have an editor whose display codes exactly match the DC encoding,
- and a set of display fonts that will produce the correct glyphs,
- there is nothing left to do but make sure that your clients are
- comfortable with the keystroke sequence. Otherwise they can
- use some piped interpretation such as
-
- MS Word | translator | TeX.
-
- The important point is that the choice of keystroke sequence for the
- comfort of the author is not really TeX's business. It is the business
- of an editor or a "Word Processor". Once an unambiguous coding has been
- established within the editor, it is relatively easy to interpret that coding
- in a way that TeX will handle effectively, but the author usually does
- not need to worry about the specifics of the interpretation.
-
-
- Email concerned with UnixTeX distribution software should be sent primarily
- to: elisabet@max.u.washington.edu Elizabeth Tachikawa
- otherwise to: mackay@cs.washington.edu Pierre A. MacKay
- Smail: Northwest Computing Support Center Resident Druid for
- Thomson Hall, Mail Stop DR-10 Unix-flavored TeX
- University of Washington
- Seattle, WA 98195
- (206) 543-6259
-