home *** CD-ROM | disk | FTP | other *** search
- Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU
- Path: sparky!uunet!paladin.american.edu!auvm!DKNKURZ1.BITNET!RZOTTO
- Message-ID: <REXXLIST%92082721372076@DEARN>
- Newsgroups: comp.lang.rexx
- Date: Thu, 27 Aug 1992 21:37:20 MET
- Sender: REXX Programming discussion list <REXXLIST@UGA.BITNET>
- From: Otto Stolz +49 7531 88 2645 <RZOTTO@DKNKURZ1.BITNET>
- Subject: Re: Blanks, REXX, and portability...
- Lines: 182
-
- On Wed, 26 Aug 92 18:10:54 PDT Dave Gomberg <GOMBERG@UCSFVM> said:
- > There may be a need to control print formatting, but the way to achieve
- > that is not the addition of the 127-to-an-emm space. The way to do
- > that is PCL or PostScript or TeX or someother tool designed to solve
- > the problem. You don't screw up the character set just to try to
- > control output.
-
- I agree, wholeheartedly. I did not give that list of various spaces from
- ISO 10646 to imply that all of them must be used to control output; my
- point was that they are defined in the forthcoming Universal Character
- Code, which imho is bound to stay.
-
- REXX will have to cope with major existing and forthcoming character
- codes, as characters are the stuff both REXX programs and REXX data are
- made of. Consequently, the REXX standard must be compatible with features
- these codes exhibit.
-
- Particularly regarding space characters, the REXX standard must account
- for these properties:
- - there are character codes comprising several space characters,
- - some space characters are meant to separate words, other characters
- are meant to form part of a word even if their visual representation
- consists in absence of a graphic symbol,
- - some character codes comprise tab characters (and possibly other means)
- to express the notions of either white space, or word boundaries, or
- both.
-
- TRL effectively uses the terms "blank", or "blank character", as a
- synonym for the notions of "word delimiter"; moreover, TRL tacidly
- assumes that there is only one sort of "blank" (aka "space") character
- in the underlying character code. (Cf. eg. the sub-section on "Parsing
- strings into words" of part 2 section 9, or the definition of the SPACE
- function in part 2 section 8 of the 1st edition -- sorry, I haven't the
- 2nd edition at hand.) Now we have seen that the tacid assumptions do not
- hold, we have to find the best compromise between the author's original
- intend, feasibility, usefulness, and adequacy (the notorious "least
- astonishing factor").
-
- In such hermeneutic process, I came to the conclusion, that
-
- 1. whenever the emphasis is on recognizing words (or tokens), REXX should
- recognize (and treat interchangeably) all characters normally used
- to delimit words
-
- (these are the language features I listed under "white space", the
- other day),
-
- 2. whenever a REXX construct is said to generate a single "blank" char-
- acter, this should be interpreted as "one single character, that will
- both act as a word-delimiter in any subsequent parsing operation and
- appearing as white space on any subsequent output" -- and analogously
- for constructs generating several "blank" characters
-
- (these are the language features I listed under "blank character", the
- other day).
-
- In the UCC (ISO 10646), the following characters will definitely serve
- as word-delimiters, hence would be covered by item 1, above:
- Space
- Ideographic space
- the following characters will never serve as word-delimiters, hence they
- shall be treated as ordinary (non-space) characters:
- Non-breaking Space
- Figure space
- I will have to read the final text of the standard to assess the other
- sorts of space and tabulator characters.
-
- I am still proposing that the REXX standard should
- - distinguish between the notions of (word-delimiting) white space and
- the blank character (generated by several language features),
- - exploit the new terms consistently, and throughout (cf. my checklists
- of yesterday).
- Furthermore I suggest that the REXX standard should present, in an in-
- formative annex, examples (based on popular character codes) of space
- characters and non-space characters (cf. above).
-
- On Wed, 26 Aug 92 20:54:12 EDT Scott Ophof <ophof@SERVER.UWINDSOR.CA>
- said:
- > But I still don't see any need for more than one character being a
- > blank, ie. the character that can always be used as separator of
- > words.
-
- One crucial point is to distinguish analysing data from generating data.
- When REXX has to analyse data, the question is not whether one character
- *can* always be used, but rather whether it *will* always be used.
- Believe me: it won't :-(
-
- Another, less obvious, point is, whether indeed one-and-the-same
- character will fit all word-delimiting situations. With universal
- character codes, such as ISO 10646, one size does *not* fit all!
- The reason is that ideographic scripts (Chinese, Korean, Japanese)
- require a different space character from letter, or syllable, based
- scripts. I have not made up my mind what the REXX (or any programming
- language) standard should do regarding this intricacy.
-
- On Wed, 26 Aug 92 20:54:12 EDT Scott Ophof said:
- > My concern is only for
- > portability *regardless* of system. And on all those I know (of),
- > SPACE is the only character they all recognize as word delimiter.
-
- Precisely to bring about the desired portability, REXX must act
- - as permissively as possible when analysing data, and
- - as predictably as possible when generating data.
- Your observation that SPACE is a (perhaps the) character all systems
- recognize does not guarantee portability of programs and data, if at
- least one system has additional means to delimit words! (We all know
- meanwhile that indeed these systems exist, even abound.)
-
- On Thu, 27 Aug 92 06:30:35 LCL Anders Christensen said:
- > Someone posted earlier that CMS programmers often tend to regard the
- > format of data as very constant, e.g. that the interesting portion of
- > the output from command XXX starts in column 42, and is 8 characters
- > long. [...]
-
- To guarantee that positional parsing templates, SUBSTRING functions,
- and the like, work as expected, the REXX standard should explicitely
- state:
- Data passed directly from one REXX program to another REXX program will
- be delivered unaltered; particularly, neither seemingly irrelevant
- information (such as trailing white space) will be removed, nor any
- form of data reduction (such as replacing sequences of blank characters
- with tabulator characters) will take place.
-
- This rule will apply to the following situations:
- - arguments passed to external REXX routines,
- - results yielded by external REXX functions,
- - data exchange via the external data queue,
- - data exchange via persistent data streams.
-
- Note: any command sent to an external environment is subject to the
- rules of that environment. Hence, when a REXX program sends a command
- to cause the environment to invoke another REXX program with a
- particular argument string, then the latter is subject to any data
- transformations normally effected by that environment.
-
- In article <9208270054.AA19196@SERVER.uwindsor.ca> Scott Ophof writes:
- > Hitting the TAB-key (etc.) eases the work of the typist. OK.
- > But the data in the file should have the correct number of blanks.
- > At the application level programmers should *not* need to concern
- > themselves with the disposition of data in a (disk) file, as Dave
- > Gomberg implies.
-
- Will REXX always be used at the application level? Do you want to pre-
- clude REXX being useed as a system programming language?
-
- Digression:
-
- On Wed, 26 Aug 1992 08:56:20 GMT Anders Christensen said:
- > [the character code in use] is sometimes hard to predict in advance,
- > and it might differ from machine to machine, and from login-session to
- > login-session.
-
- On Wed, 26 Aug 92 20:54:12 EDT Scott Ophof said:
- > This is a tough byte to swallow...
-
- Honestly, I am using a system of this sort. Even worse: the character
- code will change with every edit session you start! This is our attempt
- to cope with IBM's character code policy (can you say "un-policy"?):
- various system parts (printers, terminals, compilers, word processors,
- ...) implement differerent character codes. No official, IBM defined,
- I/O interface code matches the code expected by official, IBM supplied
- compilers (you can either buy a terminal that correctly enters the braces
- for the Pascal compiler, or the brackets, but not both). Hence, we
- invoke a character translation routine whenever the user starts editing
- a Pascal program, another one when he/she starts editing a TEX source,
- and so on.
-
- This byte is so tough that SHARE Europe, the European IBM user's organi-
- sation, has been chewing it for 12 years.
-
- End-of-digression.
-
- On Wed, 26 Aug 1992 08:56:20 GMT Anders Christensen said:
- > Please, by all means, standardize what a blank is, but *please*, don't
- > standardize it in such a way, that it makes it impossible to use the a
- > true standard Rexx interpreter on some platforms.
-
- Rather, standardize *how words are delimited*.
-
- Best wishes,
- Otto Stolz <RZOTTO@DKNKURZ1.Bitnet>
- <RZOTTO@nyx.uni-konstanz.de>
-