home *** CD-ROM | disk | FTP | other *** search
- This is an explanation of the action of Pascal I/O, as applied to text
- files. A system meeting the ISO and ANSI standards is assumed. This does
- not apply to Turbo Pascal exactly, because Turbo omits some of the standard
- abilities and functions, especially for console input. UCSD Pascal fails
- in console i/o, but other operations are implemented. PascalP functions
- exactly as described below.
-
- Any Pascal file is conceptually a single stream, with a file buffer var-
- iable. If we always refer to the file variable itself as "f", the buffer
- variable is "f^". If f is declared as "f : FILE OF thing", then f^ is of
- type "thing", and may be used as such a variable at any time the file is
- open (i.e. after the file has been reset or rewritten).
-
- A Pascal text file is equivalent to "PACKED FILE OF char", and additionally
- specifies that the eoln, readln, writeln procedures may be used. THESE MAY
- NOT BE USED ON A NON-TEXT FILE.
-
- For reading, a file at any time consists of two ordered arrays of items.
- The first is the portion that has already been input, and the second is the
- portion that has not been input yet. The buffer variable f^ always con-
- tains the last single item input (consisting of characters, an eoln mark,
- and an eof mark for text files). The eoln mark always appears as a space
- in f^, and may only be detected by the eoln procedure. The eof mark in any
- non-empty text file must immediately follow an eoln mark (specified by the
- standard). (Thus any good system will automatically append an eoln on
- closing a file, if and only if it is not already present.) The second
- portion of the file is unlimited, and unknown as yet to the Pascal program.
-
- When a file is "reset" the file is actually opened, and the first char is
- placed in f^ (this may be the eof or eoln mark, checked by eof/eoln func-
- tions). This first char is removed from the second portion.
-
- From here on, the action of the "get(f)" procedure is to advance one
- further character in the source file, discarding the old f^ value, and
- replacing it with the next char. It should always be an error to do this
- when eof is true.
-
- Note that nothing has yet affected any variable in the Pascal program,
- except the f^ buffer. These are the underlying functions of the input
- system. The program may use the file by such actions as "ch := f^" at any
- time.
-
- The syntax of "read(f, ch)" is STRICTLY defined as "ch := f^; get(f)", and
- the eoln and eof functions examine the non-visible characteristics of the
- last input character. If "f" is omitted, as in "read(ch)" the standard
- file "input" is assumed, and the buffer variable is "input^".
-
- For most CPM or MSDOS systems the file actually contains a <cr> to mark
- eoln, and a <^Z> to mark eof. The value of f^ when eof is true is not
- defined by the standards, but when eoln is true it should be a space. Thus
- the <cr> character can not appear (unless the system defines eoln as the
- <cr,lf> pair. Some systems always discard any <lf>, so that the file
- action remains the same when input from a keyboard as when input from a
- disk file.
-
- The syntax of "read(f, ch1, ch2, ..)" is defined as "read(f,ch1);
- read(f,ch2); .... ", and is simply a shorthand. If the object read-into is
- an integer, or a real, then automatic conversion is performed from a text
- string, and at completion f^ holds the terminating character (space, non-
- numeric, etc). Such a read causes a run-time error when no valid integer
- etc. is found before a terminator, but leading blanks (and eolns) are
- skipped over.
-
- Notice that nothing so far controls any flushing of input lines, to ensure
- that a read starts on the next physical line. This is performed by
- "readln(f)", which is defined as "WHILE NOT eoln(f) DO get(f); get(f)".
- NOTE the final get. This always leave f^ holding the first character of
- the next line (which is a space if the next line is empty, i.e. consists of
- eoln alone), or possibly an eof mark. Again, an omitted "f" implies input.
-
- The syntax of "readln(f, item1, item2, .. itemn)" is defined as
- "read(f,item1); read(f,item2); ... read(f,itemn); readln(f)", and is again
- just a convenient shorthand.
-
- This brings up the great bugaboo of Pascal text i/o: When a file is reset
- it MUST place the first character in f^. If that file is interactive (i.e.
- the keyboard) the first character must be typed at that time. Thus the
- natural sequence "reset(f); write('prompt message'); read(f, ch)" to get a
- reply to a prompt requires that the answer be typed before the prompt is
- made. The problem also reappears after any readln, because the first "get"
- from the next line is performed. (see below for why f^ is filled at all)
-
- This is normally cured by a special driver for text files. Whenever the
- "get" is executed it simply sets a flag somehere (totally invisible to the
- application program) which says "a get is pending". (If get finds the flag
- set it must perform the pending get, and then again set the flag). Note
- that the "get" may be implied by a reset, read, or readln operation. Now
- the system must again intercept any use of eoln, eof, or the f^ variable
- and, before actually executing them, check the "get_pending" flag. If set
- the actual get must be performed, the flag reset, and then the eoln, eof,
- f^ references may be made. This prevents the early physical read, and
- allows natural programming. However the programmer should always remember
- that any reference to eof, eoln, or f^ will cause the physical read. Thus
- the sequence "reset(f); IF eof(f) THEN something; write('prompt');
- read(f,ch)" will cause the physical read to be too early.
-
- Some systems do not follow the ANSI/ISO standard, and define a special
- interactive file type where read(f, ch) is defined as "get(f); ch := f^".
- This causes all sorts of problems, because the programmer must always know
- that this file is interactive, and programs cannot use the standard input
- and disk files interchangably.
-
- The "get" is normally executed on reset (or readln) so that the value of
- eoln and eof is available after using a character (by read), and so that
- the program can look ahead to the next character. This allows decisions to
- be made, i.e. is the following character numeric.. then read a number; or
- is it alpha .. then read a char; or is it a special .. then read a user
- command etc. Thus a file copy program such as:
-
- WHILE NOT eof DO BEGIN
- WHILE NOT eoln DO BEGIN
- read(ch); write(ch); END;
- readln; writeln; END;
-
- works naturally. The read/write line can be replaced by
-
- write(input^); get(input); END
-
- or by some sort of filter such as
-
- IF input^ <> ' ' THEN write(input^);
- get(input); END;
- to strip out all blanks.
-
- with the same action and no auxiliary variable. Such a fragment can copy
- the standard input to standard output, and works correctly with any i/o
- redirection applied.
-
- NOTE that "reset(input)" is always automatically performed when a program
- begins running, and similarly "rewrite(output)". Thus such statements
- should normally not appear in a program.
-
- Think of readln as a line-flushing procedure, but bear in mind that
- "readln(item)" is always equivalent to "read(item); readln".
-
- For output, write(f, item1, item2, .. itemn) is defined as "write(f,item1);
- write(f, item2); ... write(f, itemn)", and "writeln(f, item)" is defined as
- "write(f, item); writeln(f)". Both of these are again shorthand. The
- writeln procedure alone (i.e. writeln(f) ) simply puts an eoln mark into
- the file being written. If the "f" specification is omitted the write is
- shipped to "output" file by default.
-
- Again, the fundamental writing procedure is "put(f)", which causes the
- content of f^ to be appended to the end of the file f. "write(f, item) is
- STRICTLY defined as "f^ := item; put(f)", and should be unable to create
- the eoln mark in a text file (reserved for writeln). The action of
- "rewrite(f)" is to empty any old version of f, and leave f^ undefined. f^
- is also undefined after any write operation. Thus doing nothing except
- "rewrite(f)" in a program should leave f as an empty file, but existing.
-
- All Pascal files should be automatically closed when the defining program
- (or procedure for a local file) is exited. Some systems provide a "close"
- procedure to force an early close for one reason or another (e.g. to
- release a locked file to another user in a multi-process environment). If
- a file was open for write (via rewrite), and is later "reset", an automatic
- close is done. These closings of a written file append the eof mark, and
- force any system buffers to be flushed. Some systems are incomplete, and
- actually require that a specific call to "close" be made. This procedure
- is non-standard, and such programs will not be portable.
-
- Again, this is how it should work according to international (and ANSI)
- standards. Some systems do not meet the standards - beware.
-
- For Turbo Pascal users, I have written a set of includable procedures (see
- TURBOFIX.LBR) which make Turbo meet these standards, although you will have
- to use non-standard procedure names.
-
- I hope this clears up some confusion. C.B. Falconer 85/9/11, 87/2/12
- P