CP/M

home *** CD-ROM | disk | FTP | other *** search

/ CP/M / CPM_CDROM.iso / cpm / languags / pascal / pascalio.doc < prev next >

Wrap

Text File | 1994-07-13 | 9.5 KB | 169 lines

This is an explanation of the action of Pascal I/O, as applied to text files. A system meeting the ISO and ANSI standards is assumed. This does not apply to Turbo Pascal exactly, because Turbo omits some of the standard abilities and functions, especially for console input. UCSD Pascal fails in console i/o, but other operations are implemented. PascalP functions exactly as described below. Any Pascal file is conceptually a single stream, with a file buffer var- iable. If we always refer to the file variable itself as "f", the buffer variable is "f^". If f is declared as "f : FILE OF thing", then f^ is of type "thing", and may be used as such a variable at any time the file is open (i.e. after the file has been reset or rewritten). A Pascal text file is equivalent to "PACKED FILE OF char", and additionally specifies that the eoln, readln, writeln procedures may be used. THESE MAY NOT BE USED ON A NON-TEXT FILE. For reading, a file at any time consists of two ordered arrays of items. The first is the portion that has already been input, and the second is the portion that has not been input yet. The buffer variable f^ always con- tains the last single item input (consisting of characters, an eoln mark, and an eof mark for text files). The eoln mark always appears as a space in f^, and may only be detected by the eoln procedure. The eof mark in any non-empty text file must immediately follow an eoln mark (specified by the standard). (Thus any good system will automatically append an eoln on closing a file, if and only if it is not already present.) The second portion of the file is unlimited, and unknown as yet to the Pascal program. When a file is "reset" the file is actually opened, and the first char is placed in f^ (this may be the eof or eoln mark, checked by eof/eoln func- tions). This first char is removed from the second portion. From here on, the action of the "get(f)" procedure is to advance one further character in the source file, discarding the old f^ value, and replacing it with the next char. It should always be an error to do this when eof is true. Note that nothing has yet affected any variable in the Pascal program, except the f^ buffer. These are the underlying functions of the input system. The program may use the file by such actions as "ch := f^" at any time. The syntax of "read(f, ch)" is STRICTLY defined as "ch := f^; get(f)", and the eoln and eof functions examine the non-visible characteristics of the last input character. If "f" is omitted, as in "read(ch)" the standard file "input" is assumed, and the buffer variable is "input^". For most CPM or MSDOS systems the file actually contains a <cr> to mark eoln, and a <^Z> to mark eof. The value of f^ when eof is true is not defined by the standards, but when eoln is true it should be a space. Thus the <cr> character can not appear (unless the system defines eoln as the <cr,lf> pair. Some systems always discard any <lf>, so that the file action remains the same when input from a keyboard as when input from a disk file. The syntax of "read(f, ch1, ch2, ..)" is defined as "read(f,ch1); read(f,ch2); .... ", and is simply a shorthand. If the object read-into is an integer, or a real, then automatic conversion is performed from a text string, and at completion f^ holds the terminating character (space, non- numeric, etc). Such a read causes a run-time error when no valid integer etc. is found before a terminator, but leading blanks (and eolns) are skipped over. Notice that nothing so far controls any flushing of input lines, to ensure that a read starts on the next physical line. This is performed by "readln(f)", which is defined as "WHILE NOT eoln(f) DO get(f); get(f)". NOTE the final get. This always leave f^ holding the first character of the next line (which is a space if the next line is empty, i.e. consists of eoln alone), or possibly an eof mark. Again, an omitted "f" implies input. The syntax of "readln(f, item1, item2, .. itemn)" is defined as "read(f,item1); read(f,item2); ... read(f,itemn); readln(f)", and is again just a convenient shorthand. This brings up the great bugaboo of Pascal text i/o: When a file is reset it MUST place the first character in f^. If that file is interactive (i.e. the keyboard) the first character must be typed at that time. Thus the natural sequence "reset(f); write('prompt message'); read(f, ch)" to get a reply to a prompt requires that the answer be typed before the prompt is made. The problem also reappears after any readln, because the first "get" from the next line is performed. (see below for why f^ is filled at all) This is normally cured by a special driver for text files. Whenever the "get" is executed it simply sets a flag somehere (totally invisible to the application program) which says "a get is pending". (If get finds the flag set it must perform the pending get, and then again set the flag). Note that the "get" may be implied by a reset, read, or readln operation. Now the system must again intercept any use of eoln, eof, or the f^ variable and, before actually executing them, check the "get_pending" flag. If set the actual get must be performed, the flag reset, and then the eoln, eof, f^ references may be made. This prevents the early physical read, and allows natural programming. However the programmer should always remember that any reference to eof, eoln, or f^ will cause the physical read. Thus the sequence "reset(f); IF eof(f) THEN something; write('prompt'); read(f,ch)" will cause the physical read to be too early. Some systems do not follow the ANSI/ISO standard, and define a special interactive file type where read(f, ch) is defined as "get(f); ch := f^". This causes all sorts of problems, because the programmer must always know that this file is interactive, and programs cannot use the standard input and disk files interchangably. The "get" is normally executed on reset (or readln) so that the value of eoln and eof is available after using a character (by read), and so that the program can look ahead to the next character. This allows decisions to be made, i.e. is the following character numeric.. then read a number; or is it alpha .. then read a char; or is it a special .. then read a user command etc. Thus a file copy program such as: WHILE NOT eof DO BEGIN WHILE NOT eoln DO BEGIN read(ch); write(ch); END; readln; writeln; END; works naturally. The read/write line can be replaced by write(input^); get(input); END or by some sort of filter such as IF input^ <> ' ' THEN write(input^); get(input); END; to strip out all blanks. with the same action and no auxiliary variable. Such a fragment can copy the standard input to standard output, and works correctly with any i/o redirection applied. NOTE that "reset(input)" is always automatically performed when a program begins running, and similarly "rewrite(output)". Thus such statements should normally not appear in a program. Think of readln as a line-flushing procedure, but bear in mind that "readln(item)" is always equivalent to "read(item); readln". For output, write(f, item1, item2, .. itemn) is defined as "write(f,item1); write(f, item2); ... write(f, itemn)", and "writeln(f, item)" is defined as "write(f, item); writeln(f)". Both of these are again shorthand. The writeln procedure alone (i.e. writeln(f) ) simply puts an eoln mark into the file being written. If the "f" specification is omitted the write is shipped to "output" file by default. Again, the fundamental writing procedure is "put(f)", which causes the content of f^ to be appended to the end of the file f. "write(f, item) is STRICTLY defined as "f^ := item; put(f)", and should be unable to create the eoln mark in a text file (reserved for writeln). The action of "rewrite(f)" is to empty any old version of f, and leave f^ undefined. f^ is also undefined after any write operation. Thus doing nothing except "rewrite(f)" in a program should leave f as an empty file, but existing. All Pascal files should be automatically closed when the defining program (or procedure for a local file) is exited. Some systems provide a "close" procedure to force an early close for one reason or another (e.g. to release a locked file to another user in a multi-process environment). If a file was open for write (via rewrite), and is later "reset", an automatic close is done. These closings of a written file append the eof mark, and force any system buffers to be flushed. Some systems are incomplete, and actually require that a specific call to "close" be made. This procedure is non-standard, and such programs will not be portable. Again, this is how it should work according to international (and ANSI) standards. Some systems do not meet the standards - beware. For Turbo Pascal users, I have written a set of includable procedures (see TURBOFIX.LBR) which make Turbo meet these standards, although you will have to use non-standard procedure names. I hope this clears up some confusion. C.B. Falconer 85/9/11, 87/2/12 P