io Programmo 27

home *** CD-ROM | disk | FTP | other *** search

/ io Programmo 27 / IOPROG_27.ISO / SOFT / CALCPLUS.ZIP / CALCPLUS.TXT < prev next >

Wrap

Text File | 1996-04-02 | 28.5 KB | 772 lines

THE CALC PLUS CLASS LIBRARY by Vladimir Schipunov, Copyright (c) 1996. Version 1.0, April 2, 1996 CONTENTS 1. What is CalcPlus library. 2. Distribution and warranty notice. 3. Installation and running examples. 4. How it works. 4.1 Lexical analyzer. 4.2 Class CType hierarchy. 4.3 Class Expression hierarchy. 4.4 Language and YACC rules. 4.5 Interface with C++. 4.6 Modifying source code. 5. Known bugs and problems. 6. Appendix. Language description. 1. What is CalcPlus library. ---------------------------- The CalcPlus library is the C++ class library which provides an ability to use your own programming language built into C++ project. Almost any complex C++ application needs to be tuned by some external description, e.g. INI file. CalcPlus generalizes this approach, any algorithm or any constant needed by application can be carried out into the special file, when process comes to the key point, it calls function or procedure stored in the text file. Interpreter runs the function and process returns back to C++ code. Library contains the interpreter which understands simple nameless procedural language. Bi-directional communication between C++ and the code for interpreter available. If you develop C++ project and want to provide user access to the algorithm of application, its constants or schemes, then probably you can make use of the CalcPlus class library. Version of the language that comes with the library allows to use functions, procedures, blocks, preprocessor, global and local variables and constants, if/for/while statements. Each variable can have value of type: nil, bool, long, float, string, date. Type definitions and arrays are allowed. Functions and procedures may be recursive. New functions written in C++ may be easily added to the language. Syntax of the language can be modified by changing YACC rules. Interpreter is fast enough and may be helpful for many tasks. Interpreter was successfully used in server application of Btrieve-based financial software. Often changing on requests of customers parts of C++ code were moved into special file, which was interpreted by CalcPlus. The other application of the library was emulation Clipper machine (except "code blocks"), this allowed to debug C-extensions written for Clipper in normal C++ environment. (Actually, interpreter has a lot of common with Clipper and runs with the same speed on 16-bit platform, on 32-bit platform it is faster). Interpreter was written in 1995 during approximately 3 months. Parts of older C++ and YACC code were used. This is the first freeware release. Library pretends to be compiler/OS independent. This means you can compile it on any OS with any C++ compiler, YACC required. Templates, exception handling and RTTI were not used for compatibility with older compilers. 2. Distribution and warranty notice. ------------------------------------ The CalcPlus Class Library is freeware, you may use, modify and redistribute it under the condition that copyright notice is not removed from the source code. NO WARRANTY OF ANY KIND, YOU USE THIS SOFTWARE ON YOUR OWN RISK. Author: Vladimir Schipunov, 25 years, email: vschipun@cammail1.attmail.com phone: 1-908-2716881 Any comments, suggestions, extensions, bugs information are very welcome. 3. Installation and running examples. ------------------------------------- To install the class library, unzip file calcplus.zip. In UNIX platforms you probably will need to use options -a -L for unzip, these options are needed to convert file names to lower case and to convert DOS text with Carriage-Return-Line-Feed to UNIX text with Line-Feed. DOS: pkunzip calcplus.zip UNIX: unzip -a -L calcplus.zip This version of archive contains files: calcexpr.h | calclex.h | calctype.h | yycalc.h | calcexpr.cpp | C++ source code calclex.cpp | calctype.cpp | calcplus.cpp | calclib.cpp | yycalc.yac YACC source code calc.mak makefile for DOS gcalc.mak makefile for UNIX readme short description of the archive calcplus.txt this file hello Hello, world! example selfcheck and example of program prime example of program pi example of program To build the interpreter you will need YACC and C++ compiler. DOS: Command line options for some widely used C++ compilers for DOS are written in the file calc.mak. Uncomment or change CC and LINK to your C++ compiler and specify your version of YACC if necessary. Run make utility: make -f calc.mak UNIX: gcalc.mak is the makefile for GNU C++, simply run: make -f gcalc.mak Library was carefully tested for use with many versions of popular compilers: Borland, Watcom, Microsoft, Zortech, GNU. Generally, you should not have problems with building the interpreter. However, if you have problems, please contact me. The first thing you should do after building the interpreter is to check it: a) calc example This is the primary check of correct work of the interpreter. File 'example' contains most of interpretable syntax constructions. If version works correctly output should be: {0,1,4,9} {0,1,4,9,{1,2,3}} {0,1,9,{1,2,3}} 1 7 255 {a,b,c} {a,{d,e,f},c} {a,{d,TRUE,f},c} 3 ab 1 exiting... b) calc pi [Number of iterations] If interpreter works right you should see something around PI number. c) calc prime [Upper limit] This test calculates prime numbers below the upper limit, 1000 by default. Command line switch /d can be added to trace the program, e.g.: calc prime 100 /d Note, that interpreter uses recursive algorithms and requires enough space on the stack. So, if you get runtime error 'stack overflow', increase the stack size. If interpreter works right, you can begin its adaptation for your own tasks. Here is the general description of the library. 4. How it works. ---------------- First of all, interpreter designed using YACC (yet another compiler of compilers). Hand written lexical analyzer takes input from the file(s), yyparse() function processes input and builds program as the tree of instructions. Each node of the tree has its own value, execution goes from the child nodes to the root. So, CalcPlus consists of: lexical analyzer, yacc parser, basic types hierarchy, hierarchy of language instructions. 4.1. Lexical analyzer. ---------------------- In general function YLex::yylex() is the traditional translator of input stream into tokens for yacc parser. Class YLex takes care of token analysis, simple preprocessing is performed. However, there are some features listed below. Tokens are divided into two parts - the first contains key words, signs of arithmetic operations, etc., the second part contains tokens returned by descendors of YLex class. Overloadable method YLex::__name() may find the word in the lists of already defined symbols: functions, procedures, structures, variables or constants. At the first case token lx*** used, at the second - yy***. Several simple container classes provide storage and linear search of the objects and references to them. There is also stack container, objects stored in stack are destroyed after pushing them, so often only references to the objects are pushed into stack. Preprocessing is performed by pushing into the stack of input streams. Only 'include', 'define', 'ifdef', 'endif' directives are supported. Input stream is an 'ifstream' for file input and 'istrstream' for preprocessor. Symbol '->' is translated as 'implementation'. When interpreter finds statement 'A -> B' it assumes that 'if A then B end' occurred. Strings separated by both '...' and "..." are allowed. Comments are of C++ style: // This is comment /* Another comment */ Case of letters is insignificant. In all other senses lexical analyzer acts like any other yylex function. 4.2. Class CType hierarchy. --------------------------- Class CType is the base class for all classes corresponding to the CalcPlus types. They are: CNil, CBool, CLong, CDouble, CString, CDate, CArray. These types may be used for writing program for the interpreter. When yylex() finds immediate value in the program, it allocates the object of appropriate type. For instance, following constants correspond to types: 2 CLong 1.2 CDouble true CBool 'aaa' CString {1,2,3} CArray New types may be added by simply adding new class inherited from the base abstract class CType. Values of new types may be returned by C++ functions or yyparse() may be changed to translate input tokens into the instances of new class. Class CType has a good number of pure virtual methods, some of them actually may be set dummy. First of all, class should provide its identification. Unique numeric identifier is received from enumeration in file calctype.h. This identifier should be used in the constructor of CType and returned by method type(). Method name() is the symbolic identification of the class. Method copy() returns pointer to the copy of the object allocated by operator new. Usually this is call of copying constructor: CType* copy() const { return CNewClass( *this ); } Input and output methods should be overwritten as well. They are: void print( ostream& ) const; void get( istream& ); Other important methods are comparison and assignment: CType& operator =( const CType& t ); compare( const CType& ) const; In order to provide standard implementation of these operations for simple objects represented by sequence of bytes, methods data(), size() and ptr() are used. These methods are intended to give an access to binary data of objects. The only difference between data() and ptr() is that data() is const method and it should be used in most cases rather than method ptr(). const void *data() const; void *ptr(); size() const; Class CArray is only non-primitive example of data type. It implements data() and ptr() methods to return null pointer and overloads methods of comparison and assignment. Arrays may be indexed by strings. Field CArray::structure points to the array of CString's, operator[](const char*) looks for the pattern string in the index array and returns reference to the object of the array, which has the same index as the element of CArray::structure. String indexes are used with structure types: struct abc {a,b,c}; abc a; echo a.b; This will be translated as the CArray indexed by CArray {'a','b','c'} and echo a['b']. Such definitions may be useful though they are not strict. No optimization made for speed of the index search. Classes of the CType hierarchy are relatively simple. For details of realization refer to the source code in file calctype.cpp. 4.2. Class Expression hierarchy. -------------------------------- Classes of Expression hierarchy are the main part of the interpreter. All lexical input after parsing is converted into the tree of the instances of Expression inheritants as the nodes. Each node, as it is the instance of Expression, has: 1) field 'flags' which shows the current state of the node 2) field 'v' which is the pointer to the value of the node 3) method 'Calc' which is called every time when node is being calculated Let us see what the process actually does during interpretation. Recursive method Expression::Calculate runs first at child nodes, if the execution of child nodes was not interrupted by setting non-zero flags, then method Calc() of the node will be called. Calc() refers to the values of its child nodes. For instance, if we have inherited class Addition from the abstract class Expression, then method Addition::Calc() could look like: void Addition::Calc() { *v = *child[0]->v + *child[1]->v; } More precisely, we should check the type of arguments, because this is an interpreter and there is no type checking in compilation time. So, the method rather should be: void Addition::Calc() { if( child[0]->type()!=idLong || child[1]->type()!=idLong ) { flags = exError; return; } delete v; v = new CLong; (CLong&)(*v) = (CLong&)(*child[0]->v) + (CLong&)(*child[1]->v); } Process is controlled by the state of bit flags of the nodes. There are normal flags, like indication of function return or exit from while/for statement, and flags showing that runtime error occurred. Flags are copied from the children to parents. Analysis of the state of flags can show the location of node where there was error. Every statement written in input language is translated to the instance of corresponding class of Expression hierarchy. This is the picture of the hierarchy. Expression base class | |-------XImmediate immediate value | | | +-------XEndl line-feed constant | |-------XBreak exit from for/while | |-------XAr1 unary arithmetic | | | +-------XBool1 unary boolean | |-------XAr2 binary arithmetic | | | +-------XComparison comparison | | | +-------XBool2 binary boolean | |-------XVariable variable | |-------XEcho output on cout | |-------XConditional if ... then ... else ... end | |-------XLoop like 'continue' in C | |-------XWhile while/for | |-------XBlock begin ... end | | | +-------XFunction func/proc ... end | | | +-------XUserFunction external C++ functions | |-------XCall function call: f(1,2,3) | | | +-------XDynamic function by name: &('f')(1,2,3) | |-------XReturn return expr | +-------XSet arrays: {1,2,{1,2,3}} Though C++ is much more easier language than English, and full description of used algorithms and methods can be found in file calcexpr.cpp, we will discuss in the next paragraphs some non-obvious points of architecture of the Expression hierarchy. For storage values of variables class Var is used. There may be many references (nodes of the tree) to the variable in different expressions, each reference is the instance class XVariable. Class XVariable has fields: PrintObj* obj; CType** ptr; int ref; Field obj is used for debugging output, ptr for setting new value to the variable. Field ref used as flag for passing argument to the function by reference. This field is set by XCall class temporally while function works. Method XVariable::Calc() acts in different manners depending of the number of child nodes. We assume, that if the number of children is zero, then this is the usage of variable inside of other expression, e.g. (x+y). If node of XVariable class has only one child node, then this is the operation of assignment: x := expr. When the number of child nodes is two, this is array element inside of the expression; (x[i]+y[j]). The only case left is three child nodes - this is assignment to the array element: x[i] := expr. child[0] is considered to be an array, child[1] index in the array, child[2] is expression which is assigned. Class XBlock is the composite expression consisting of a number of subexpressions. In the input language corresponding statement is: begin expr1 expr2 ... exprn end Every block bounds visibility of variables defined inside of it. That is why class XBlock has fields: vars, funcs and structs. Actually list of functions used only in global context. Function is the block, its arguments are local variables. Function has only one child node of class XBlock, which is the body of the function. Like XVariable references to Var, XCall references to XFunction. Arguments for function are child nodes of XCall. Method XCall::TieArgs() is called twice. The first call is to assign values of the arguments to the local variables of XFunction. The second is to assign back values for arguments passed by reference. It is easy to see that all algorithm with keeping temporary results of calculation in the nodes of the tree does not allow recursive calls of functions. To remove this problem method Expression::Recursion used. There is a stack of pointers to the values. By the signal, all subnodes of the function put their values onto the stack. This action is synchronized with the passing arguments in method XCall::TieArgs(). There is no separate class XFor. Class XWhile provides both types of iterations: for and while. Most of other classes are obvious and intuitively clear. 4.4. Language and YACC rules. ----------------------------- Class CalcPlus is derived from the class YLex. It overloads method yyparse() for YACC parsing of the tokens from the input stream. Language is described by the set of rules for YACC, generally, every rule simply translates its arguments to the appropriate instance of class Expression hierarchy. For correct context handling, stack mechanism is used. Each recursive syntax construction has corresponding stack container in the class CalcPlus, they are: LexStack Blocks; // blocks LexStack Calls; // function calls LexStack Cond1; // 'if' part of the condition LexStack Cond2; // 'else' part of the condition LexStack Sets; // sets LexStack Idx; // array indexes Often used definitions XBEG, XEND, XSEQ are intended for handling current block context. When new variable defined, we store it in the list of variables of the current block. if/else/for/while statements have implicit blocks inside of them: if a then a:=b; end; This is actually translated as: if a then begin // local variables may be defined here a:=b; end; end; Class CalcPlus overloads method __name and searches for symbols that are already defined. This makes syntax analysis easier. Method Link uses recursive tree search for connecting XCall nodes with the XCall. Simple diagnostic is done. Method UserSym() is not implemented. It was initially added for different extensions. For example, we change method __name to translate symbols beginning with letter '@' as lxUser: Token CalcPlus::__name() { Token t = YLex::__name(); if( t == lxName && *Lex == '@' ) { Expression* e = new XImmediate( new CString( Lex+1 )); YYLVAL( e ); return lxUser; } ... } When function yyparse() gets such token it calls method UserSym(). There two possible calls: with one or two arguments. One argument, if token is detected on the right side of assignment, two arguments, if token is on the left. Possible implementation of method UserSym may be: Expression* CalcPlus::UserSym( Expression *e1, Expression *e2 ) { Expression *e = new XEcho; if( e1 ) e->Add( e1 ); if( e2 ) e->Add( e2 ); return e; } So, the statement "@Hello := ' World!';" will print: Hello World! 4.5. Interface with C++. ------------------------ It is possible both to call C++ code from interpreter code as well as to call interpreter functions and procedures from C++. There are a number of definitions at the end of file calcexpr.h to help writing C++ function visible from the interpreter. Let's see the implementation of functions EMPTY and GETENV: USER_FUNC( Empty ) // Is value empty? DEF_ARGX( 0, x ) // We don't know the type of argument RETURNS( Bool ) // Function returns TRUE or FALSE ret = x.empty(); // Getting the result USER_END // Done USER_FUNC( Getenv ) // Reading environment variable DEF_ARGV( 0, var, String ) // Expecting string argument RETURNS( String ) // Result will be the string also const char* s = getenv( var ); // Calling C function ret = s ? s : ""; // Check if var is not in env. USER_END // Done Functions and procedures must be registered before running the interpreter to make them visible from the program. Function UserLib defined in module calclib.cpp performs the registration: void UserLib() { RegFunc( "EMPTY", Empty ); RegFunc( "GETENV", Getenv ); // // Other functions // } If number of arguments exceed one, it should be passed as the third parameter. For procedures DEF_PROC and RegProc are used. Call of the interpreter function from C++ code is illustrated in file calcplus.cpp. Function 'atexit' called when program finishes. Method CalcPlus::Call() takes as arguments pointer to function name, number of arguments, and pointers to CType arguments: if( calc.Global->funcs( "atexit" )) { CString s("exiting... "); calc.Call( "atexit", 1, &s ); } 4.6. Modifying source code. --------------------------- If you are going to use the library in your project, then most likely you will have to change its source code for your own needs. There are different ways of source code modification, and you should choose the better one. Which one is better depends of how serious changes you need. The simplest way to extend the library is to add new functions visible from the interpreter. This can be done by modifying file calclib.cpp. Another way of easy modification is the change of language syntax, see file yycalc.yac. More difficult solution may require change of the hierarchies CType and Expression. In this case you should overwrite necessary methods and probably change YACC rules. Actually whole CType hierarchy can be replaced by your own hierarchy, if you already have something like that in your project. As the example of changes in source code, let us consider steps necessary for implementation of big numbers arithmetic: a) We need CType inheritor, which will store, print, and calculate very big numbers (hundreds of significant digits). b) Method Calc() of classes Ar1, Ar2, Comparison should be changed to be able handle big numbers. c) YLex::yylex() must detect big number from the input stream and return corresponding token. This can be done by adding special conversion function as well. d) CalcPlus::yyparse() must generate new Immediate( new BigNumber ) when such token detected. After we have done these steps, hopefully, big numbers arithmetic will be available from the interpreter's programs. 5. Known bugs and problems. --------------------------- The biggest known problem is obvious: error diagnostic is too simple. So the user with low programming experience may have a lot of problems trying to write program for the interpreter. Compiler returns line number 0 when EOF inside of unclosed block detected. So, line number 0 is the last line in file. Complex recursive define directives may work not properly. There is no real reason yet to develop full built-in preprocessor. Passing arguments to function by reference is not absolutely correct. There were problems when operator throw was used in C++ code called from inside of such function. However, this problem can be easily avoided by adding flag to variable, which says that variable has passed its value to the function, so operator delete cannot be used for the pointer to value. If you have found more errors, bugs, problems - please, let me know. 6. Appendix. Language description. ---------------------------------- This is informal description of CalcPlus interpreter's language. Most of the syntax looks and works like the same syntax in other languages. Language has a lot of common with C and Clipper. Like C: Module is the unit of compilation. Program can consist of more than one module. Start symbol is MAIN if not redefined. File can include other files by #include 'filename' directive. Global variables may be defined in the module context. Semicolon ';' is the separator between statements. Sign '!' is the logical NOT. Not like C: Case is insignificant. Preprocessor has only 'define', 'ifdef', 'endif' directives. No logical operations available for preprocessor. There are no static variables. Assignment is ':='. Strings are declared with both (') and (") separator: 'str1', "str2". Unary assignment sign used in comparison: if a=b then ... end; EXIT and LOOP keywords are the same as 'break' and 'continue' in C. OR, AND keywords used instead of ||, &&. ARRAY, ADEL, AADD should be used for array access. ARGC, ARGV functions provide access to the command line arguments. '<>' used instead of '!=' for logical not_equal. Functions and procedures must be described before calling, description of arguments is not required. Functions and procedures may contain return statement. Default return value is NIL. func a; proc b; ... var x:=a(1,2,3); b('abc'); ... func a(x,y,z) return x+y+z; end; proc b(x) echo x,endl; end; Arguments preceding with sign '*' are passed to the function by reference. Arrays are always passed by reference. Result of the program below will be 1 2 3 3: func a(x) x:=x+1; return x; end; proc main() var x:=0; echo a(*x),' ',a(*x),' '; echo a( x),' ',a( x); end; Blocks are allowed in any place inside of function or procedure: begin expr1; expr2; ... exprn; end; Variables, constants, structures may be declared in any place of program, they are visible only inside of block where were defined. No type checking. Structures are actually arrays indexed by arrays of strings. var count := 1; const pi := 3.14; struct abc {a,b,c}; abc test; Output is performed by ECHO statement followed by expressions list. ENDL is the Line-Feed constant. echo '2*2=',2*2,endl; All control structures: begin, if, for, while, func, proc must be closed by keyword 'end'. Variables defined inside of such structures are considered as local for them. STEP keyword may be omitted in FOR statement, step is 1 by default. if a=b then f1(); else f2(); end; for i:=1 to 10 step 2 do echo i*i,' '; end; while a>b do b := b+1; end; All the arithmetic operations are usual: ((2+3)-1)*6/2. <EOF> -----