home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!zaphod.mps.ohio-state.edu!swrinde!cs.utexas.edu!qt.cs.utexas.edu!yale.edu!ira.uka.de!fauern!uni-erlangen.de!not-for-mail
- From: unrza3@cd4680fs.rrze.uni-erlangen.de (Markus Kuhn)
- Newsgroups: comp.std.internat
- Subject: Re: ISO Latin 1 to 7-bit ASCII conversion (final draft!)
- Date: 18 Dec 1992 19:14:12 +0100
- Organization: Regionales Rechenzentrum Erlangen
- Message-ID: <1gt4dkEINNi5i@uni-erlangen.de>
- References: <1gi1rnEINN1cg@uni-erlangen.de> <1992Dec16.165027.9152@admin.kth.se>
- Reply-To: mskuhn@immd4.informatik.uni-erlangen.de
- NNTP-Posting-Host: cd4680fs.rrze.uni-erlangen.de
- Lines: 152
- Keywords: character sets, ISO 8859-1, terminals, user interface
-
- ojarnef@admin.kth.se, psv@nada.kth.se (Olle Jarnefors) writes:
-
- >> Users should know if the text they read has been converted from the
- >> original Latin 1 text. ...
-
- >Do you have in mind any specific way of visually indicating that
- >conversion takes place? Underlining converted characters?
- >Something else?
-
- I had just a good explanation in the documentation and a reminding message
- after program start in mind. Extra characters like [(c)] and [x]
- again make the replacements longer and destroy the layout even worse.
- In some applications, they might be of use, so I'll describe them
- as a possible option. If underlining etc. is possible, this would be
- more attractive. But powerfull terminals that allow underlining,
- bold face etc. often also provide ISO 8859-1 and then we have the
- only REAL solution for the whole problem.
-
- BTW: Kermit translates <copyright> to @, which looks similar, but has
- confused me already a lot. But reading USENET articels about transcriptions
- using a transcription system is always very confusing.
-
- >> ... This avoids confusion if e.g. someone asks for
- >> sending him a 3<fraction 1/2>" disk [3="], which will be displayed
- >> after the conversion as 31/2" (= 15.25").
-
- >This particular problem is most easily solved, we suggest, by
- >converting the character not to "1/2" but to " 1/2", with an
- >initial space character.
-
- This was only one example of a long list of possible problems that
- can't be solved by a non 1-1 mapping solution. 1-1 mapping solutions
- (e.g. [a:] according to RFC 1345) have the problem, that you need
- to transform the possible pure ASCII sequences like [, a, : and ] with an
- escape mechanism. This will modify even 7-bit textes and that was not
- my intention. I don't want to design an strict encoding, but anything
- that makes reading e.g. 8-bit USENET articles easier on old terminals.
-
- >Two of the "high" characters of ISO 8859-1
-
- >160 "A0 '240 NO-BREAK SPACE (NBSP)
- >173 "AD '255 SOFT HYPHEN (SHY)
-
- >are not ordinary graphic characters but a sort of hybrid
- >characters with both a graphic component and a control
- >component.
-
- >For soft hyphen the graphic component is an ordinary hyphen
- >glyph. The functional component is that this glyph should only
- >be displayed or printed if the character is at the end of a
- >line. If it is somewhere else in the line, _nothing_ should be
- >displayed or printed.
-
- I agree with you completely here, and that is how I would use these
- characters if I had to develop a simple text editor with a few
- word processing functions. WordStar users will be very familiar with
- the SHY and NBSP characters. But the text of ISO 8859-1:1987(E) does
- not define the functionality you describe your second and third sentence.
-
- >In the simple, context-insensitive conversion that we are
- >dealing with here, SHY should be converted to the empty string,
- >since it will occur less often at the end of a line than
- >elsewhere.
-
- NO! I and ISO 8859-1 absolutely disagree here with you. SHY has to be
- displayed as something similar to a hyphen. If you remove SHYs that are
- not at the end of the line or are not followed by space, than this
- might be acceptable, but please NEVER remove SHYs at the end of the line.
- Even not in the trival context insensitive case that I selected in order
- to keep things simple in the hope that many PD developpers will use the system.
-
- >For TABLE 0 we suggest the following changes:
-
- >0b: 173 "AD '255 SOFT HYPHEN (SHY)
- > Now: "-"
- > Suggestion: ""
-
- No, see above.
-
- >0c: 175 "AF '257 MACRON
- > Now: SUBST
- > Suggestion: "-"
-
- My first suggestion was " ", but Steve Summit insisted on SUB. Perhaps
- "-" is the best solution, especially if MACRON becomes popular for
- underlining the previous line.
-
- >0d: 176 "B0 '260 DEGREE SIGN
- > Now: SUBST
- > Suggestion: "o"
-
- > This is most often used in numerical data and can, without
- > risk of misunderstanding, be substituted with the lowercase
- > "o", as is often done.
-
- A better suggestion was " ", as 25 C and 23 34' 44'' will still be understood.
- I'll change this to " ".
-
- >0e: 188 "BC '274 VULGAR FRACTION ONE QUARTER
- > Now: "1/4"
- > Suggestion: " 1/4"
-
- One of my goals was to keep the length below 3. There are many other
- strings that might cause possible confusion. In a context sensitive
- system, this surely would make sense.
-
- > DIVISION SIGN
- > Suggestion: "-:"
-
- > This symbol has the meaning of subtraction in some countries
- > and some application fields. In addition, division is
- > in some countries normally indicated by "/" rather than ":".
- > We therefore suggest that the conversion should be neutral
- > by trying to approximate the appearance of the symbol,
- > rather than its meaning. "-:" is better than ":-", since
- > the "-" can't be misinterpreted as a minus on a following
- > number.
-
- I didn't know this, as both DIVISION SIGN and : are used in Germany
- for division. "-:" seems to be quite artificial, so if ":" really causes
- much confusion, SUB may be better here. Which countries use ":" for
- substraction?
-
- >1f: 187 "BB '273 RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
- > Now: '>'
- > Suggestion: '"'
-
- Why?
-
- >1g: 188 "BC '274 VULGAR FRACTION ONE QUARTER
- > Now: SUBST
- > Suggestion: "/"
-
- > By using "/" instead of the general fallback character at
- > least we indicate that the real character was a vulgar
- > fraction.
-
- The important information has been lost, and I would prefer one single
- fallback character.
-
- Thank you for your comments. I'll include at least some of them in my text.
-
- BTW: There is a serious bug in the Latin1toASCII function and only one
- person has detected it so far ...
-
- Markus
-
- --
- Markus Kuhn, Computer Science student -=-=- University of Erlangen, Germany
- Internet: mskuhn@immd4.informatik.uni-erlangen.de | X.500 entry available
- ----- Anyone participating in the use of MS-DOS, Heroin or Cocaine is -----
- ---- simply not getting the most out of life possible. (Brian Downing) ----
-