NetNews Usenet Archive 1992 #19

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #19 / NN_1992_19.iso / spool / comp / lang / rexx / 794 < prev next >

Wrap

Text File | 1992-08-26 | 8.1 KB | 165 lines

Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU Path: sparky!uunet!gatech!paladin.american.edu!auvm!DKNKURZ1.BITNET!RZOTTO X-Acknowledge-To: <RZOTTO@DKNKURZ1> Message-ID: <REXXLIST%92082621432914@DEARN> Newsgroups: comp.lang.rexx Date: Wed, 26 Aug 1992 18:41:09 MEZ Sender: REXX Programming discussion list <REXXLIST@UGA.BITNET> From: Otto Stolz <RZOTTO@DKNKURZ1.BITNET> Subject: Re: Blanks, REXX, and portability... In-Reply-To: Message of Wed, 26 Aug 92 08:31:32 PDT from <GOMBERG@UCSFVM> Lines: 152 On Wed, 26 Aug 92 08:31:32 PDT Dave Gomberg said: > [...] There is no NEED for lots of different blank characters. May I humbly object: In June, ISO DIS 10646-1 "Universal Character Code" was approved as an international standard, which will presumably be published in early 1993. Major vendors have expressed their intention to implement this standard, which is meant as a possible replacement for ASCII (and the other national variants of ISO 646-1983 "ISO 7-bit Character Set for Information Interchange"), the ISO 8859 series ""8-bit single-byte coded character sets", and a lore of other standard or proprietary character codes. This standard will comprise (besides, of course, the good old ASCII tabs) the following space vaiants: Space Non-breaking Space Ideographic Space n-Space m-Space 3-per-m-space 4-per-m-space 6-per-m-space figure space punctuation space thin space hair space zero-width space Obviously, there *are* people seeing a need ... :-) Apparently, the trend goes towards more sophistication in word processing and finer control over printing devices. Another trend is towards reconciling electronic data processing and typesetting. Generally speaking, our programming languages must cease to presuppose the (obsolete!) typewriter-based notion of "a character is a byte is a writing position". I guess you are *not* advocating that REXX should become extinct in the forthcoming EDP world :-) REXX can cope well with any character code, if it is defined and imple- mented consistently. Regarding white space, this would involve 1. that the term "white space" (or an equivalent one) be defined in the forthcoming standard in a code-independend way, 2. that all language features recognising words, or depending in other ways on the meaning of white space, be identified, and that the standard required consistent implementation of these features, e.g. - tokenizing of the REXX source program, and of the the INTERPRET statement's operand (cf. note 1, below), - variables, and dots, in parsing templates, - WORD, WORDINDEX, WORDPOS, SUBWORD, and WORDS, functions, - STRIP (cf. note 2), and SPACE, functions, - weak character comparison (so-called "normal" comparative operators applied to non-decimal operands), - padding default in COMPARE funktion (cf. note 2), - DATATYPE function, - white space in operands of numeric operations (including functions) (Warning: This list may not be exhaustive); 3. that the term "blank character" (or an equvalent one) be defined in the standard as one definite character belonging to the constituents of white space (cf. item 1, supra); 4. that all language features generating white space (including the defaults for pad characters) be identified, and that the standard required consistent implementation of these, viz. generation of blank characters, e.g. by - concatenating terms with one blank in between (expressed by white space in lieu of an operator), - default padding character in SPACE, CENTER, LEFT, RIGHT, INSERT, OVERLAY, SPACE, SUBSTR, and TRANSLATE, functions, - padding of the shorter operand in weak character comparisons, - the FORMAT function (Warning: This list may not be exhaustive); 5. that standard-conforming implementations be required to implement the recognition of white space in a way conforming - to all possible sources for REXX source programs, - to all possible sources for input to REXX programs (cf. note 1); 6. that standard-conforming implementations be required to implement the blank character in a way conforming - to all possible environments REXX programs may address, - to all possible sinks for output from REXX programs (cf. note 1). In a nutshell: REXX language features should be as permissive as possible when accepting white space, and as predictable as possible when generating it. To achieve this goal, the standard should replace the notions of "blank", "blanks" or "blank characters" whith "white space" whenever characters are inspected, and replace them with "blank character" or "blank characters" whenever characters are generated. Note 1: Another recent contribution to REXXLIST stated that REXX source code, and REXX operands, might be represented in different character codes (perhaps including different notions of white space). A cursory scan through TRL did not reveal any support for this statemnt. REXX source code and REXX operands are tightly coupled (actually tighter than in any other programming language I am aware of) by several language features, e.g. - literal strings, - INTERPRET statement, - SYMBOL, and VALUE, functions - SOURCELINE function, - VALUE sub-keywords of ADDRESS, SIGNAL, and TRACE statements, - ADDRESS, and TRACE, functions. To me, this tight coupling suggests that source program and operands should ideally be expressed in the same character code. If the standard does not require this, it must give precise, and simple, rules how every single of these language features shall handle the discrepancies -- while trying to minimize the astonishing factor. For less consistent systems, the REXX implementation will somehow have to level out the irregularities (regarding character codes, particularly white space). The standard may choose to provide suitable OPTIONS operands to assist in this regard. Note 2. By default, the STRIP function should remove leading and/or trailing white space (rather than blank characters). By default, the COMPARE function should ignore white space in the excessive part of the longer operand (in other words: when no pad character is specified, COMPARE should return 0, iff the longer operand consists of an exact copy of the shorter operand, followed by any amount of white space, and it should return the position of the 1st non-white character in the excessive part of the longer operand, iff the latter consists of an exact copy of the shorter operand followed by anything but white space). Note that these defaults cannot explicetly be specified via the currently valid interface. This idiosyncracy could be removed by an additional (yet minor, and upwards-compatible) language extension: - For the STRIP function, the standard could allow an arbitrary string rather than a single character as its 3rd argument; the meaning would be to remove sequences of any characters specified. - For the COMPARE function, the standard could allow an arbitrary string rather than a single character as its 3rd argument; the meaning would be to ignore, in the excessive part of the longer operand, sequences of any characters specified. Note that there is no need to define a canonic form for white space in operands, as there would be REXX functions to accomplish any desired transformations, if the above items 1 to 6 became standard. Particularly, stretches of white space could be easily transformed to single blanks (to allow for a sensible comparison) by applying the SPACE function. Note also that this function has already the suggestive name of SPACE rather than BLANK :-) Best wishes, Otto Stolz <RZOTTO@DKNKURZ1.Bitnet> <RZOTTO@nyx.uni-konstanz.de>