home *** CD-ROM | disk | FTP | other *** search
- Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU
- Path: sparky!uunet!gatech!paladin.american.edu!auvm!DKNKURZ1.BITNET!RZOTTO
- X-Acknowledge-To: <RZOTTO@DKNKURZ1>
- Message-ID: <REXXLIST%92082621432914@DEARN>
- Newsgroups: comp.lang.rexx
- Date: Wed, 26 Aug 1992 18:41:09 MEZ
- Sender: REXX Programming discussion list <REXXLIST@UGA.BITNET>
- From: Otto Stolz <RZOTTO@DKNKURZ1.BITNET>
- Subject: Re: Blanks, REXX, and portability...
- In-Reply-To: Message of Wed, 26 Aug 92 08:31:32 PDT from <GOMBERG@UCSFVM>
- Lines: 152
-
- On Wed, 26 Aug 92 08:31:32 PDT Dave Gomberg said:
- > [...] There is no NEED for lots of different blank characters.
-
- May I humbly object:
-
- In June, ISO DIS 10646-1 "Universal Character Code" was approved as an
- international standard, which will presumably be published in early
- 1993. Major vendors have expressed their intention to implement this
- standard, which is meant as a possible replacement for ASCII (and the
- other national variants of ISO 646-1983 "ISO 7-bit Character Set for
- Information Interchange"), the ISO 8859 series ""8-bit single-byte coded
- character sets", and a lore of other standard or proprietary character
- codes. This standard will comprise (besides, of course, the good old
- ASCII tabs) the following space vaiants:
- Space
- Non-breaking Space
- Ideographic Space
- n-Space
- m-Space
- 3-per-m-space
- 4-per-m-space
- 6-per-m-space
- figure space
- punctuation space
- thin space
- hair space
- zero-width space
- Obviously, there *are* people seeing a need ... :-)
-
- Apparently, the trend goes towards more sophistication in word processing
- and finer control over printing devices. Another trend is towards
- reconciling electronic data processing and typesetting. Generally
- speaking, our programming languages must cease to presuppose the
- (obsolete!) typewriter-based notion of "a character is a byte is a
- writing position". I guess you are *not* advocating that REXX should
- become extinct in the forthcoming EDP world :-)
-
- REXX can cope well with any character code, if it is defined and imple-
- mented consistently. Regarding white space, this would involve
- 1. that the term "white space" (or an equivalent one) be defined in
- the forthcoming standard in a code-independend way,
- 2. that all language features recognising words, or depending in other
- ways on the meaning of white space, be identified, and that the
- standard required consistent implementation of these features, e.g.
- - tokenizing of the REXX source program, and of the the INTERPRET
- statement's operand (cf. note 1, below),
- - variables, and dots, in parsing templates,
- - WORD, WORDINDEX, WORDPOS, SUBWORD, and WORDS, functions,
- - STRIP (cf. note 2), and SPACE, functions,
- - weak character comparison (so-called "normal" comparative operators
- applied to non-decimal operands),
- - padding default in COMPARE funktion (cf. note 2),
- - DATATYPE function,
- - white space in operands of numeric operations (including functions)
- (Warning: This list may not be exhaustive);
- 3. that the term "blank character" (or an equvalent one) be defined in
- the standard as one definite character belonging to the constituents
- of white space (cf. item 1, supra);
- 4. that all language features generating white space (including the
- defaults for pad characters) be identified, and that the standard
- required consistent implementation of these, viz. generation of
- blank characters, e.g. by
- - concatenating terms with one blank in between (expressed by
- white space in lieu of an operator),
- - default padding character in SPACE, CENTER, LEFT, RIGHT, INSERT,
- OVERLAY, SPACE, SUBSTR, and TRANSLATE, functions,
- - padding of the shorter operand in weak character comparisons,
- - the FORMAT function
- (Warning: This list may not be exhaustive);
- 5. that standard-conforming implementations be required to implement
- the recognition of white space in a way conforming
- - to all possible sources for REXX source programs,
- - to all possible sources for input to REXX programs
- (cf. note 1);
- 6. that standard-conforming implementations be required to implement
- the blank character in a way conforming
- - to all possible environments REXX programs may address,
- - to all possible sinks for output from REXX programs
- (cf. note 1).
-
- In a nutshell: REXX language features should be as permissive as
- possible when accepting white space, and as predictable
- as possible when generating it. To achieve this goal, the
- standard should replace the notions of "blank", "blanks"
- or "blank characters" whith "white space" whenever
- characters are inspected, and replace them with "blank
- character" or "blank characters" whenever characters are
- generated.
-
- Note 1: Another recent contribution to REXXLIST stated that REXX source
- code, and REXX operands, might be represented in different
- character codes (perhaps including different notions of white
- space). A cursory scan through TRL did not reveal any support
- for this statemnt.
-
- REXX source code and REXX operands are tightly coupled (actually
- tighter than in any other programming language I am aware of) by
- several language features, e.g.
- - literal strings,
- - INTERPRET statement,
- - SYMBOL, and VALUE, functions
- - SOURCELINE function,
- - VALUE sub-keywords of ADDRESS, SIGNAL, and TRACE statements,
- - ADDRESS, and TRACE, functions.
-
- To me, this tight coupling suggests that source program and
- operands should ideally be expressed in the same character
- code. If the standard does not require this, it must give
- precise, and simple, rules how every single of these language
- features shall handle the discrepancies -- while trying to
- minimize the astonishing factor.
-
- For less consistent systems, the REXX implementation will somehow
- have to level out the irregularities (regarding character codes,
- particularly white space). The standard may choose to provide
- suitable OPTIONS operands to assist in this regard.
-
- Note 2. By default, the STRIP function should remove leading and/or
- trailing white space (rather than blank characters). By default,
- the COMPARE function should ignore white space in the excessive
- part of the longer operand (in other words: when no pad character
- is specified, COMPARE should return 0, iff the longer operand
- consists of an exact copy of the shorter operand, followed by any
- amount of white space, and it should return the position of the
- 1st non-white character in the excessive part of the longer
- operand, iff the latter consists of an exact copy of the shorter
- operand followed by anything but white space).
-
- Note that these defaults cannot explicetly be specified via the
- currently valid interface. This idiosyncracy could be removed
- by an additional (yet minor, and upwards-compatible) language
- extension:
- - For the STRIP function, the standard could allow an arbitrary
- string rather than a single character as its 3rd argument;
- the meaning would be to remove sequences of any characters
- specified.
- - For the COMPARE function, the standard could allow an arbitrary
- string rather than a single character as its 3rd argument;
- the meaning would be to ignore, in the excessive part of the
- longer operand, sequences of any characters specified.
-
- Note that there is no need to define a canonic form for white space in
- operands, as there would be REXX functions to accomplish any desired
- transformations, if the above items 1 to 6 became standard. Particularly,
- stretches of white space could be easily transformed to single blanks
- (to allow for a sensible comparison) by applying the SPACE function.
- Note also that this function has already the suggestive name of SPACE
- rather than BLANK :-)
-
- Best wishes,
- Otto Stolz <RZOTTO@DKNKURZ1.Bitnet>
- <RZOTTO@nyx.uni-konstanz.de>
-