home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!gatech!bloom-beacon!eru.mt.luth.se!lunic!sunic!aun.uninett.no!ugle.unit.no!ugle!anders
- From: anders@lise3.lise.unit.no (Anders Christensen)
- Newsgroups: comp.lang.rexx
- Subject: Re: Blanks, REXX, and portability...
- Message-ID: <ANDERS.92Aug26105620@lise3.lise.unit.no>
- Date: 26 Aug 92 08:56:20 GMT
- References: <9208260321.AA05688@SERVER.uwindsor.ca>
- Sender: news@ugle.unit.no (NetNews Administrator)
- Organization: /home/flipper/anders/.organization
- Lines: 192
- In-Reply-To: ophof@SERVER.UWINDSOR.CA's message of 26 Aug 92 03:21:29 GMT
-
- In article <9208260321.AA05688@SERVER.uwindsor.ca> ophof@SERVER.UWINDSOR.CA (Scott Ophof) writes:
-
- > Question: When are blanks not only the space character?
- > Answer: In any case in Unix.
-
- Answer is incorrect .... Correct answer (for any ASCII-based system):
-
- In some cases, blanks are tabs and in some cases blanks are spaces, and
- it is sometimes hard to predict it in advance, and it might differ
- from machine to machine, and from login-session to login-session.
-
- > In "The REXX Language" (Mike Cowlishaw) and most other REXX
- > publications meant for CMS and equivs, the character known as
- > "space" and the concept of "blanks" are used interchangeably.
-
- TRL always uses the terms 'blank' and 'blanks', I believe?
-
- > For CMS-REXX (and analogous implementations), this poses no
- > problems. The space character is the *only* character defined
- > as a "blank".
- > (Note that I'm *not* talking about non-printable characters,
- > and the space char *is* a printable character!)
-
- Actually, this is rather a EBCDIC vs ASCII conflict, rather than a
- CMS vs the-rest-of-the-world (is there is difference, btw? :-)
-
- In ASCII, the following characters are often considered 'whitespace',
- listed in decreasing order of 'whitespaceness' (codes in decimal)
-
- ascii 32 - space
- ascii 9 - HT (horizontal tab)
- ascii 10 - LF (line feed)
- ascii 13 - CR (carriage return)
- ascii 12 - NP (new page, or FF - formfeed)
- ascii 11 - VT (vertical tab)
-
- There might be even more. And worse, in some modes, I think characters
- above 128 are space characters, like hard-space (a space that can not
- be divided between lines). In particular the HT is considered
- whitespace, since it conceptually a number of compressed space
- characters (customarily 2-8).
-
- > Under Unix however, the tab character (and some others) are
- > considered "blanks", though it's called "whitespace" there.
- > At least some REXX implementations for Unix recognize more than
- > the space char as whitespace. And REXX on the PC recognizes at
- > least the space char and the tab char as whitespace/blank...
-
- Well ... try this if you are using Unix:
-
- who | od -a
-
- Now, are all whitespace spaces, or is there any tabs (ht) mixed into
- the output? *Many* of the Unix commands use tab and other whitespace
- characters in the output. Reason: to save characters. If you are using
- a 300 baud modem line, compressing 8 spaces to a tab is a Major
- Advancement of Civilization.
-
- > My point?
- > I would hate to have to port to CMS any REXX program written for
- > Unix (or PC); to have a program fail due to something like this
- > would not be very easy to debug...
-
- I think you would hate even more to port a rexx program from one unix
- machine to another unix machine; and have the program fail due to one
- of the machines being more intelligent about compressing spaces to
- tabs. And it is far more probable that you would do that, than porting
- between CMS and Unix! And, this is not just a Unix problem; it is more
- or less a ASCII problem.
-
- If the ANSI REXX committee requires that blank have one specific
- character code within each character set, then IMHO the committee has
- made Rexx harder to move *from* EBCDIC (i.e. IBM mainframes), not
- eased the spreading of Rexx to the rest of the computing community.
- Most machines use multiple characters as blanks, and Rexx should not
- be limited to just those machine which have One Blank character.
-
- By the way ... I really can't see the problem? Unix generates tabs
- and spaces as whitespace, the Unix rexx interpreter interprets boths as
- blanks, No problem!
-
- You port it to CMS: CMS generates spaces as whitespace, the CMS
- interpreter interprets only space as blanks. No problem.
-
- You ftp a file from Unix to CMS. You don't use binary mode, since the
- ASCII code would be unreadable on a EBCDIC machine anyway. So you use
- text mode, and your CMS machine translates the text to EBCDIC,
- including translating the tabs to spaces. No problem.
-
- You ftp a file from CMS to Unix, by the same reason as above, you use
- text mode, all the CMS spaces becomes Unix spaces. No problem.
-
- Your Unix code contains parse patterns like '09'x to match a tab, you
- take your program over to CMS. But if you make assumptions about the
- glyphs of the characters, you're in trouble anyway. In fact, if your
- Unix Rexx interpreter interpreted tab as a whitespace character, you
- probably wouldn't have had to parse on the '09'x pattern in the first
- place. No problem.
-
- Where is the problem, Scott?
-
- (Oops, I seem to have assumed that all Unix machines are ASCII, which
- is probably not correct; interpret "CMS" as EBCDIC-based, and Unix as
- ASCII-based in the list above, that is more what I meant.)
-
- > My suggestion?
- > In the interest of increasing the chance of successful porting, to
- > request the ANSI-REXX committee to define that the *only* blank/
- > whitespace recognized in standard REXX is the SPACE character (ASCII
- > hex-20, EBCDIC hex-40).
-
- And there is indeed also a very good chance that this suggestion (if
- accepted) will make any rexx interpreter for Unix rather useless, or
- at best, just redline the astonishing factor (see end of posting for
- an example).
-
- Instead, perhaps the ANSI REXX committee should look over the shoulder
- of the ANSI C committee, and how they solved the problem of
- whitespace, and their definition of the isspace() function.
-
- If the ANSI REXX committee determines that one particular character in
- ASCII and one particular character in EBCDIC is to be considered the
- Only True Blank, it might even have consequences for using Rexx with
- national characters (locales) (yes, in some systems, blanks may even
- depend on the language chosen!).
-
- Section one of TRL states that Rexx "... involve the use of two
- character sets." One used for the rexx script (source code), and one
- used by the interpret under execution (data). If you want to define
- that a rexx script may only include certain characters as blanks under
- parsing of the source code (except in quotes and comments), that's
- fine. However, if you suggests that only certain characters should be
- recognized by the interpreter as blanks under execution (as data), then
- I fear Real Trouble.
-
- I also want to question whether appointing specific character codes as
- the Only True Blank, is in the spirit of TRL. To quote section 1:
- "[...] this book uses characters to convey meaning and not to imply a
- specific character code [...] At no time is REXX concerned with the
- glyph (actual appearance) of a character."
-
- The way I read the, a Blank is the common character used as a Blank in
- the operating system that you are running. Consequently, if your
- operating system has more than one blank character, all these are
- interpreted as blanks. However, as a default to pad characters in the
- builtin functions, it would be appropriate to require that one single
- character is used consistently. But I still think it should be beyond
- the definition of Rexx as a language to specify which character that
- is in the various character sets.
-
- > Your comments? :-)
-
- Please, by all means, standardize what a blank is, but *please*, don't
- standardize it in such a way, that it makes it impossible to use the a
- true standard Rexx interpreter on some platforms.
-
- My suggestion?
- In the interest of increasing the chance of successful porting of the
- Rexx standard itself from EBCDIC to ASCII based systems, to request
- that the ANSI-REXX committee explicitly allows the common whitespace
- characters of the host operating system to be interpreted as 'blank'
- characters, and that the definition of exactly what is blank, is
- implementation-dependent and system-dependent.
-
- -anders
-
- So, the example I promised. The output from the command 'who' on
- this machine (ultrix) currently is:
-
- > jorgens ttyp0 Aug 24 10:51 (129.241.27.23:0.)
- > anders ttyp3 Aug 26 09:05 (129.241.36.3:0.0)
-
- Suppose the address syntax made it possible to push something on the
- stack, then the following Rexx program ought to work:
-
- address unix to queue 'who'
- do queued()
- parse pull user . . . time node
- say user 'is logged in from' node 'at time' time'!'
- end
-
- Simple isn't it? It is just that is only works on some machines, and
- even then only sometimes. Why would such a program write out:
-
- jorgens is logged in from at time 10:51 (129.241.27.23:0.)!
- anders is logged in from at time 09:05 (129.241.36.3:0.0)!
-
- The answer is, there is at tab between the time and the hostname. This
- rexx script will work, dependent on the machine type, the terminal
- type, the mood of your system operator and a lot of other conditions.
- Prohibiting the tab as a space did not *help* the user in this
- situation, in fact it will only confuse.
-