NetNews Usenet Archive 1992 #19

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #19 / NN_1992_19.iso / spool / comp / lang / rexx / 809 < prev next >

Wrap

Text File | 1992-08-27 | 8.9 KB | 193 lines

Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU Path: sparky!uunet!paladin.american.edu!auvm!DKNKURZ1.BITNET!RZOTTO Message-ID: <REXXLIST%92082721372076@DEARN> Newsgroups: comp.lang.rexx Date: Thu, 27 Aug 1992 21:37:20 MET Sender: REXX Programming discussion list <REXXLIST@UGA.BITNET> From: Otto Stolz +49 7531 88 2645 <RZOTTO@DKNKURZ1.BITNET> Subject: Re: Blanks, REXX, and portability... Lines: 182 On Wed, 26 Aug 92 18:10:54 PDT Dave Gomberg <GOMBERG@UCSFVM> said: > There may be a need to control print formatting, but the way to achieve > that is not the addition of the 127-to-an-emm space. The way to do > that is PCL or PostScript or TeX or someother tool designed to solve > the problem. You don't screw up the character set just to try to > control output. I agree, wholeheartedly. I did not give that list of various spaces from ISO 10646 to imply that all of them must be used to control output; my point was that they are defined in the forthcoming Universal Character Code, which imho is bound to stay. REXX will have to cope with major existing and forthcoming character codes, as characters are the stuff both REXX programs and REXX data are made of. Consequently, the REXX standard must be compatible with features these codes exhibit. Particularly regarding space characters, the REXX standard must account for these properties: - there are character codes comprising several space characters, - some space characters are meant to separate words, other characters are meant to form part of a word even if their visual representation consists in absence of a graphic symbol, - some character codes comprise tab characters (and possibly other means) to express the notions of either white space, or word boundaries, or both. TRL effectively uses the terms "blank", or "blank character", as a synonym for the notions of "word delimiter"; moreover, TRL tacidly assumes that there is only one sort of "blank" (aka "space") character in the underlying character code. (Cf. eg. the sub-section on "Parsing strings into words" of part 2 section 9, or the definition of the SPACE function in part 2 section 8 of the 1st edition -- sorry, I haven't the 2nd edition at hand.) Now we have seen that the tacid assumptions do not hold, we have to find the best compromise between the author's original intend, feasibility, usefulness, and adequacy (the notorious "least astonishing factor"). In such hermeneutic process, I came to the conclusion, that 1. whenever the emphasis is on recognizing words (or tokens), REXX should recognize (and treat interchangeably) all characters normally used to delimit words (these are the language features I listed under "white space", the other day), 2. whenever a REXX construct is said to generate a single "blank" char- acter, this should be interpreted as "one single character, that will both act as a word-delimiter in any subsequent parsing operation and appearing as white space on any subsequent output" -- and analogously for constructs generating several "blank" characters (these are the language features I listed under "blank character", the other day). In the UCC (ISO 10646), the following characters will definitely serve as word-delimiters, hence would be covered by item 1, above: Space Ideographic space the following characters will never serve as word-delimiters, hence they shall be treated as ordinary (non-space) characters: Non-breaking Space Figure space I will have to read the final text of the standard to assess the other sorts of space and tabulator characters. I am still proposing that the REXX standard should - distinguish between the notions of (word-delimiting) white space and the blank character (generated by several language features), - exploit the new terms consistently, and throughout (cf. my checklists of yesterday). Furthermore I suggest that the REXX standard should present, in an in- formative annex, examples (based on popular character codes) of space characters and non-space characters (cf. above). On Wed, 26 Aug 92 20:54:12 EDT Scott Ophof <ophof@SERVER.UWINDSOR.CA> said: > But I still don't see any need for more than one character being a > blank, ie. the character that can always be used as separator of > words. One crucial point is to distinguish analysing data from generating data. When REXX has to analyse data, the question is not whether one character *can* always be used, but rather whether it *will* always be used. Believe me: it won't :-( Another, less obvious, point is, whether indeed one-and-the-same character will fit all word-delimiting situations. With universal character codes, such as ISO 10646, one size does *not* fit all! The reason is that ideographic scripts (Chinese, Korean, Japanese) require a different space character from letter, or syllable, based scripts. I have not made up my mind what the REXX (or any programming language) standard should do regarding this intricacy. On Wed, 26 Aug 92 20:54:12 EDT Scott Ophof said: > My concern is only for > portability *regardless* of system. And on all those I know (of), > SPACE is the only character they all recognize as word delimiter. Precisely to bring about the desired portability, REXX must act - as permissively as possible when analysing data, and - as predictably as possible when generating data. Your observation that SPACE is a (perhaps the) character all systems recognize does not guarantee portability of programs and data, if at least one system has additional means to delimit words! (We all know meanwhile that indeed these systems exist, even abound.) On Thu, 27 Aug 92 06:30:35 LCL Anders Christensen said: > Someone posted earlier that CMS programmers often tend to regard the > format of data as very constant, e.g. that the interesting portion of > the output from command XXX starts in column 42, and is 8 characters > long. [...] To guarantee that positional parsing templates, SUBSTRING functions, and the like, work as expected, the REXX standard should explicitely state: Data passed directly from one REXX program to another REXX program will be delivered unaltered; particularly, neither seemingly irrelevant information (such as trailing white space) will be removed, nor any form of data reduction (such as replacing sequences of blank characters with tabulator characters) will take place. This rule will apply to the following situations: - arguments passed to external REXX routines, - results yielded by external REXX functions, - data exchange via the external data queue, - data exchange via persistent data streams. Note: any command sent to an external environment is subject to the rules of that environment. Hence, when a REXX program sends a command to cause the environment to invoke another REXX program with a particular argument string, then the latter is subject to any data transformations normally effected by that environment. In article <9208270054.AA19196@SERVER.uwindsor.ca> Scott Ophof writes: > Hitting the TAB-key (etc.) eases the work of the typist. OK. > But the data in the file should have the correct number of blanks. > At the application level programmers should *not* need to concern > themselves with the disposition of data in a (disk) file, as Dave > Gomberg implies. Will REXX always be used at the application level? Do you want to pre- clude REXX being useed as a system programming language? Digression: On Wed, 26 Aug 1992 08:56:20 GMT Anders Christensen said: > [the character code in use] is sometimes hard to predict in advance, > and it might differ from machine to machine, and from login-session to > login-session. On Wed, 26 Aug 92 20:54:12 EDT Scott Ophof said: > This is a tough byte to swallow... Honestly, I am using a system of this sort. Even worse: the character code will change with every edit session you start! This is our attempt to cope with IBM's character code policy (can you say "un-policy"?): various system parts (printers, terminals, compilers, word processors, ...) implement differerent character codes. No official, IBM defined, I/O interface code matches the code expected by official, IBM supplied compilers (you can either buy a terminal that correctly enters the braces for the Pascal compiler, or the brackets, but not both). Hence, we invoke a character translation routine whenever the user starts editing a Pascal program, another one when he/she starts editing a TEX source, and so on. This byte is so tough that SHARE Europe, the European IBM user's organi- sation, has been chewing it for 12 years. End-of-digression. On Wed, 26 Aug 1992 08:56:20 GMT Anders Christensen said: > Please, by all means, standardize what a blank is, but *please*, don't > standardize it in such a way, that it makes it impossible to use the a > true standard Rexx interpreter on some platforms. Rather, standardize *how words are delimited*. Best wishes, Otto Stolz <RZOTTO@DKNKURZ1.Bitnet> <RZOTTO@nyx.uni-konstanz.de>