The C Users' Group Library 1994 August

home *** CD-ROM | disk | FTP | other *** search

/ The C Users' Group Library 1994 August / wc-cdrom-cusersgrouplibrary-1994-08.iso / vol_100 / 163_02 / smallcv2.doc < prev next >

Wrap

Text File | 1991-01-07 | 23KB | 464 lines

.* My thanks to Dick Dievendorff for fixing up this file to format with .* GML. Now, if someone could do the same with SCLIB DOC . . . :gdoc. :frontm. :titlep. :title.Small-c Version 2.0 :title.for the IBM Personal Computer :title.Release 1.01 :date. :author.Daniel R. Hicks :address. :aline. :aline.RCH38DB(HICKS) :eaddress. :etitlep. :preface. :p.Portions of Small-c code Copyright 1982 by J. E. Hendrix. :p.Converted to the IBM Personal Computer by D. R. Hicks. :p.This document was developed entirely by D. R. Hicks, and is :hp1.not:ehp1. copyrighted. :body. :h1.Why Small-c? :p. Why bother with a limited version of the C programming language when complete versions of the language are available? The main reason is price. The cheapest "full" C implementations run about $75, and prices increase from there up to the $500 range. Small-c is free, and this means that many users who would not want to pay for a full implementation will have access to this one. Other users who may not be sure that they want pay for a full C implementation can use this one to determine if they like the language. :p. Another reason for the existence of Small-c is control: You get all of the source for Small-c, and you can, if you wish, maintain or modify it yourself. You can also change the I/O support package to suit your needs, and you can even port the compiler to a different processor if you wish (this version is converted from an 8080 version). :h1.What can Small-c do? :p. Small-c takes a source file, consisting of C-subset statements, and creates an assembler source file. This assembler source file can be processed by either the IBM Small Assembler or the IBM Macro Assembler to produce an IBM/Intel format OBJ file. The OBJ file can, in turn, be combined with other OBJ files and LIB files via the DOS LINK function to produce an executable EXE module. An I/O & utility function library is provided for this purpose, implementing an environment which quite closely mimics the UNIX environment. :p. This version of Small-c can be executed on an IBM PC with 128K of storage, one single-sided diskette drive, and running IBM DOS 1.1 or 2.0. In all likelihood, the compiler will execute on a 96K system, but this has never been tried. Since at least 96K is needed to run either of the assemblers, there is little merit in executing in less than 96K. Although dual diskette drives are necessary for large compilations (where the assembler source can occupy most of one diskette), they are useful but not essential otherwise. Where a hard file is available, this may be used in place of diskettes. The compiler runs equally well on DOS 1.1 and DOS 2.0, although it does not utilize the increased function of DOS 2.0. Release 1.01 has added up to 2 dimensional arrays of all supported data types (D. Lang 12/90) :p. Programs created with the compiler (and suitably assembled and linked) can be executed on any IBM DOS system with sufficient storage. Normally, the programs must be linked with the associated Small-c library which, in addition to providing I/O support, sets up the operating environment. However, a reasonably adept programmer should be able to link Small-c procedures with procedures from other compilers, if necessary coding assembler "glue" via the the Small-c "#asm" statement. :p. Although it has never (to my knowledge) been attempted, there is no known reason why this version of Small-c (and the programs it compiles) should not execute equally well on any suitably configured IBM PC-compatible MS DOS system. :h1.What WON'T it do? :p. Small-c was originated by Ron Cain, and was originally published in Dr. Dobb's Journal, number 45. The original intent was to provide users of small microprocessor systems with a compact yet powerful systems programming language, one which could be both used and maintained on a small system. For this reason, many of the features of full C implementations were omitted, resulting in a restricted but usable language. Small-c Version 2.0 was developed by J. E. Hendrix and published in Dr. Dobb's Journal, numbers 74 & 75. This version, slightly larger to account for the tendency toward larger systems, has fewer restrictions and is considerably more powerful. Nonetheless, it has significant restrictions relative to full C implementations. The following are the principle restrictions: :ol. :li.Structures and unions are not supported. :li.Up to 2 dimensions arrays supported in Release 1.01 :li.Floating point is not supported. :li."Long" integers are not supported. :li.Only functions returning integers are supported. :li.Pointers to pointers, arrays of pointers, and several of the other exotic forms of declarations are not supported. :eol. :h1.Running Small-c :p. The compiler may be invoked with the parameters specified on the command line, with prompting for the parameters, or a combination of the two. To invoke the compiler with prompting, type: :sl. :li.CC :esl. :pc.The compiler will display the prompt: :sl. :li.Input file [CON.C]: :esl. :pc.Type in the name of the file containing your C program. A filename extension of "C" is assumed. If you enter no filename, input will be accepted from the keyboard. :p.The compiler will then display the prompt: :sl. :li.Output file [CON.ASM]: :esl. :p. Type in the file name of the file to contain the assembler source. A filename identical to the source file name and an extension of "ASM" is assumed. :p. The compiler will then display the prompt: :sl. :li.Listing file [NUL.LST]: :esl. :p. Normally, the listing file is either suppressed (by routing to the NUL device) or is routed to the printer (by specifying "prn", for example). If no filename is specified, NUL is assumed. :pc. The compiler then asks the following yes/no questions: :sl. :li.Interleave C source? :li.Monitor function headers? :li.Sound alarm on errors? :li.Pause on errors? :esl. :p. "Interleave C source" means to place the C source statements into the assembler source as comments. This is very useful if the assembler source is to be examined or modified after compilation, but it increases the size of the assembler source file (by slightly more than the size of the C source file). :p. "Monitor function headers" means to display each function header (and include statement) as it is processed. This is useful both for monitoring the progress of the compilation and for interpreting the context of error messages from the compiler. This question is not asked (and function headers and include statements are not displayed) if output is being routed to the display screen. :p. "Sound alarm on errors" means to sound the PC's "bell" (beeper) whenever an error message is displayed (also, when the compilation ends). This is useful for long compilations where one might wish to leave the computer unattended but be alerted when attention was needed. :p. "Pause on errors" means to stop after displaying each error message and wait for an "ENTER" before proceeding. This prevents the error message from being scrolled off the screen before it can be examined. :p. After these questions have been answered, the compilation will begin. C source statements will be read from the input file and written to the output file. Function headers and include statements will be displayed if selected. When the compilation is finished, the following message is displayed (where n is a number): :sl. :li.There were n errors in this compilation :esl. :p. If the number of errors is zero (or if you are willing to accept whatever the errors are that have occurred), you may process the produced assembler source with ASM (IBM Small Assembler) or MASM (IBM Macro Assembler). Once the program has been successfully compiled and assembled, it should be linked, using the DOS LINK program, with the CC.LIB library supplied, thereby producing the executable EXE file. :p. When parameters are specified via the DOS command line, the file names must be entered in order -- source file, assembler file, listing file -- followed by or interspersed with the processing options. Roughly, the syntax is: :sl. :li.CC [<source_file> [<assembler_file> [<listing_file> ]]][-[n]<option> . . .][;] :esl. :p. Each file name, option, or the ";", must be separated by spaces from adjacent entries. The options are identified by a leading "-" and contain an optional "n" which indicates the "not" of the option. The options are: :sl. :li.i -- interleave C source :li.m -- monitor function headers :li.a -- sound alarm on errors :li.p -- pause on errors :esl. :p. The ";" indicates that defaults are to be taken for the remaining options or file names (rather than prompting for them). :h1.Data representations :p. Small-c recognizes seven different data types: :sl. :li.Integers :li.Characters :li.Integer arrays :li.Character arrays :li.Integer pointers :li.Character pointers :li.Integer functions :esl. :pc. No other combinations are recognized. :p. Integers are signed binary numbers ranging between -32768 and +32767 and occupying 16 bits (two bytes) of data storage on a byte boundary. The low-order eight bits of the number is stored in the byte with the lower address. :p. Characters are signed binary numbers ranging between -128 and +127 and occupying 8 bits (one byte) of data storage on a byte boundary. :p. Integer arrays can be up to a maximum of 2 dimensions of 16 bit signed integers. Each element occupies two bytes of data storage on a byte boundary, with the low-order eight bits of each element occupying the byte with the lower address. The first element (the one corresponding to an index value of zero) occupies the two-byte area with the lowest address, the next element occupies the adjacent higher address, etc. :p. Character arrays can be up to a maximum of 2 dimensions of 8 bit signed integers. Each element occupies one byte of data storage on a byte boundary. The first element (the one corresponding to an index value of zero) occupies the one-byte area with the lowest address, the next element occupies the adjacent higher address, etc. :p. Integer and character pointers are both 16 bit unsigned numbers, ranging between 0 and 65535 and occupying two bytes of data storage on a byte boundary. Like integers, the low-order eight bits of the number is stored in the byte with the lower address. The only distinction between the two is in the semantics of their use: Dereferencing (via "*") an integer pointer causes a two byte quantity to be fetched or stored, while dereferencing a character pointer causes a one byte quantity to be fetched and stored. Similarly, incrementing (decrementing) an integer pointer causes two to be added to (subtracted from) the pointer, while incrementing (decrementing) a character pointer causes one to be added to (subtracted from) the pointer. :p. Functions are variable length areas of code storage beginning on a byte boundary. Functions are presumed to consist of 8088 instructions arranged in a sequence which is meaningful and which conforms to Small-c linkage and usage protocols. :p. The C language is fairly unique among "modern" languages in that the value of a named entity (i.e., variable or function) is the contents of the entity only when the entity is a scalar. In other cases, the value of the entity is its address. Thus, the value of an integer or character array is the offset in the data segment to the first element (the one with an index of zero) and the value of a function (when its name is not followed by a parameter list) is the offset in the code segment to the entry point of the function. :p. Since C is a loosely-typed language, much of the semantics of a named entity is dependent upon the context of its use. Any of the above data types may be dereferenced with a "*", for instance (although dereferencing a character variable or a function name is guaranteed to be meaningless -- and dangerous). Likewise, any of the above data types may be treated as a function and called by appending a parameter list. (In this case, character variables and array names are meaningless -- and even more dangerous.) :p. Another unusual aspect of C is that character values are extended to integer values when used in an expression, when passed as an argument, or when referenced by a switch or return statement. The C language standard permits this conversion to be done either via sign extension, or by filling the high-order bits of the integer with zeros. Small-c uses sign extension. Thus an expression such as: :sl. :li.c == 255 :esl. :pc.(where c is a character variable) will never be true. :h1.Language features and restrictions :p. In addition to the restrictions stated above, the following restrictions hold: :sl. :li.Lower-case and upper-case symbols are synonymous. :li.Local declarations at a block level and goto statements may not be used in the same function. :li.The sizeof operator is not supported. :li.The cast operator is restricted to four cases: :sl. :li.(int) :li.(char) :li.(int *) :li.(char *) :esl. :pc.No extraneous blanks are allowed within the cast operator. :li.Initializers are only permitted on global or static declarations, and only literal values may be used for initializers. :esl. :h1.Storage and linkage conventions :p. This version of Small-c utilizes the "small" storage model: The CS register addresses a single code segment (of up to 64K), while the DS, ES, and SS registers address a single data segment (also of up to 64K). The code segment comes first (lowest) in storage, followed immediately by the data segment. Static storage and string constants are allocated at the low end of the data segment, with a heap growing up from the top of the statics, and a stack growing down from the top of the data segment. The initialization code contained in C.LIB initializes the data segment to be as large as available storage, up to the 64K maximum. Although the function is so far unused, provision is made for allocating I/O buffers above the data segment when extra storage is available. In theory, the above allows a single program to utilize up to 128K of storage, although, in practice, one will usually use up the code segment roughly twice as fast as the data segment, so that 96K (not counting DOS) is a more practical limit. :p. Subroutine linkage and automatic storage allocation utilizes the stack. To call a subroutine, parameters are first pushed onto the stack. In standard C fashion, scalar parameters (char or int) are passed by value, while arrays and strings are passed by address. Char and int values both occupy two bytes on the stack. Char values are are sign-extended to 16 bits. Address values are also passed as two-byte quantities: The interpretation of the address as an offset into the code segment or and offset into the data segment depends on the way it is used. (In fact, Small-c does not recognize pointers to functions as a data type, but reference to a function name without following "()" yields its offset into the code segment, and reference to a variable followed by "()" results in a call to the location in the code segment indicated by the variable.) :p. Parameters are pushed in order of occurrence: The first parameter in a list is the first one pushed and therefore the deepest one in the stack. This is opposite the order of many C compilers, and it prevents some C library functions (such as "printf") from being able to determine how many parameters are present by examining the first or second one. For this reason, the compiler, prior to a CALL, loads register DL with the parameter count, thus allowing functions such as "printf" to be implemented. This feature (the loading of DL) can be disabled in cases where it is not needed, thereby generating more compact code. :p. After the parameters have been pushed into the stack, the function is called (as a NEAR procedure). This causes the return address to be pushed onto the stack. The called function then saves the current BP register value by pushing it onto the stack, loads BP from the current SP register value, and increments SP by the size of automatic storage needed. Thus, automatic storage can be addressed by using negative offsets from BP, while parameters can be addressed using positive offsets. :p. To return from a function, the result, if any is first loaded into the AX register. Then, SP is loaded from BP (to pop any automatic storage present), and the old value of BP is restored by popping it from the stack. Finally, a NEAR return is executed which pops the saved return address from the stack and returns to the calling function. It is the responsibility of the calling routine to pop the parameters from the stack (this is an area of possible incompatibility with other languages). :p. Although the effect can be negated by the inappropriate use of global or static variables, all code generated by this compiler is reentrant. :h1.Execution environment :p. It is the intent of this implementation to mimic the standard UNIX C operating environment as accurately as possible and reasonable, given the limitations of Small-c and the difference in operating systems. Thus, :hp1.main:ehp1. is passed a pair of parameters, as in UNIX C, with the first parameter indicating the number of arguments, and the second parameter pointing to an array of integers which can be cast into character pointers to the arguments. These arguments are the blank-separated arguments parsed from the DOS command line, except that the first argument (which in UNIX C is the name of the program) is always "main". :p. Also optionally parsed from the command line are redirections of :hp1.stdin:ehp1. and :hp1.stdout:ehp1.. If an argument on the command line is preceeded by the character "<", it is treated as the file name for stdin rather than being added to the argument list. And if an argument on the command line is preceded by the character ">", it is treated as the file name for stdout rather than being added to the argument list. (Caution: Under DOS 2.0, these arguments are intercepted and the redirection is performed by DOS rather than by C.LIB. Unfortunately, there are bugs in the DOS 2.0 redirection support.) If not redirected, stdin, stdout, and stderr all refer to the keyboard/display. stderr is not redirectable. :h1.I/O system :p. It is the intent of this implementation to mimic the standard UNIX C I/O interfaces as accurately as possible and reasonable, given the limitations of Small-c and the difference in environment. Thus, both I/O layers are implemented: The "standard I/O" layer and the UNIX system call layer. The principle difference between the two layers is not so much function as it is of performance: The standard I/O interface provides an extra level of buffering which is useful to many applications which process input or generate output a character at a time, while the UNIX system call interface is more efficient when this extra buffering is not needed. In addition, the UNIX system call interface permits slightly more precise control over the I/O, while the standard I/O interface functions are slightly easier to use when processing character data. :p. As with UNIX C, access to the "standard I/O" layer generally requires the user to include stdio.h, so that various values are defined and various externals are declared. Also as with UNIX C, the include errno.h contains error codes that are stored in the external variable errno by the I/O routines. :p. The UNIX system call layer reserves the filenames "kbd", "scrn", "con", and "prn" (all lower case) and treats files opened to these names specially. The first three names are intercepted and routed to the appropriate keyboard/ display interfaces, while the fourth is routed to the printer. All other file names are passed to DOS to be treated as DOS disk/diskette file names or DOS reserved file names. :p. One unusual aspect of this I/O system is that it provides a "bridge" between the UNIX C file scheme which uses newline as a record terminator, and the IBM/MS-DOS file scheme which uses Carriage Return/Line Feed as a record terminator. This conversion is implicitly performed for all I/O. If non-character data is to be processed, this conversion must be disabled via :hp1.fbinary:ehp1. or :hp1._ioctl:ehp1.. :p. The implemented portions of the two levels of I/O interfaces are described in a separate document. In general, the standard I/O functions have the same names as their UNIX counterparts, while the UNIX system call functions have underline ("_") prefixed on their names. For more detail, users are advised to refer to :cit.The Unix System:ecit. by S. R. Bourne, or :cit.UNIX Programmer's Manual:ecit. by Bell Laboratories. :h1.Debugging :p. This Small-c implementation contains very little in the way of debugging aids. In many cases, this is of no significance, since the C language has a fascinating tendency to produce programs which run without error on the first attempt. However, since even the best programmer will eventually produce errors if his programs are large enough, some debugging facilities and techniques are usually necessary. The standard IBM/MS-DOS facilities, combined with common debugging techniques (such as debug "print" statements) and the few facilities provided by the language, have proved adequate for debugging thus far. :p. Someone who plans to use the compiler extensively should take some time to familiarize himself with code generated by the compiler. This will make it easier to associate code sequences from DEBUG "unassemble" with the correct C source statements, thus reducing or eliminating the need to obtain assembler listings. :p. When using LINK to produce a C program's EXE module, it is a good idea to obtain a link MAP if debugging of the EXE module is likely to be necessary. To do this, respond to the LINK "List File" prompt with "PRN" or a file name AND specify the "/m" option so that the map is really produced, rather than just obtaining a list of segment sizes. This map can then be used when dumping global variables or setting DEBUG breakpoints. :egdoc.