More on the infix parser. ========================= N. Wirth helped me with this one. I was leafing through his book: Algorithm+DataStructures=Programs one day and found some charting, I forget the name of it. Anyhow, they seemed neat at the time and give almost a direct correspondence to code. Something other people using other types of flowcharts have also noticed. (Notably: Wil Baden in Charting Forth, 1986 FORML Conference Procedings, a superb paper). I know some people are probably horrified. Using Wirth for Forth? Well, I believe that Wirth one day will write (if he didn't already) a threaded interpreted language. I also wanted to write a recursive utility that did a little more then calculate a bunch of numbers in a special sequence. (fibo...? a heck.) If you don't like it recursive, look through the back issues of Forth Dimensions for a operator-stack type parser. What follows are an explanation of the precedence, the charts on which it is based (I hope they are clear enough) and a small description of the routines which is in addition to the shadow screens. Precedence. ----------- Every infix parser takes precedence of operators into consideration. The following are defined from the highest to lowest precedence: operator explanation ( ) nesting - ~ negate not << >> shift left shift right & ! and or * / multiply divide + - add subtract These operators work on numbers, which have a 32 bit limit in my system. Numbers can also start with an $ sign. This indicates an hexadecimal radix, and for the number following only. In addition to numbers the system can use identifiers. These identifiers are normal Forth constants. Forth variables would use the address of that variable, simply what that identifier leaves on the stack. Identifiers must start with a _ (underscore) . (period) or an alpha character. The rest can be alpha characters, numbers or _ and . The length of the identifier is limited by the underlying Forth, usually 31 characters. Charts. ( From the top down ) ------- expression ------------> simple-ex >----+--+------> ^ | | | v v | | + - | | | | +--------------------+--+ simple-ex ------------> term >----+--+------> ^ | | | v v | | * / | | | | +--------------------+--+ term ------------> factor >----+--+------> ^ | | | v v | | & ! | | | | +--------------------+--+ factor ------------> factor1 >----+--+------> ^ | | | v v | | << >> | | | | +--------------------+--+ factor1 ---+-------+-----> factor2 >---------> | ^ | | +-> - >-+ | | +-> ~ >-+ factor2 ---+--------> getconstant >---------+----> | ^ | | +-> ( >--> expression >---> ) >-+ getconstant ---+--------> getnumber >----------------> | ^ | | +-------> identifier >-------+ | | | | +--> ' >--> getchar >--> ' >--+ getnumber ---+---------------> digit >-+-----------> | ^ ^ | | | | | +--> $ >--+ +-----------+ identifier ---+-> letter >-------+----------------+--> | ^ ^ | ^ | | | | | | | +--> _ >---+ | +--> letter >--+ | | | | | | | | | | | | | +--> . >---+ | +--> digit >--+ | | | +------------------+ Routines. --------- TEST Small word to test the evaluation process. Uncomment it before loading. To use it, type from the Forth prompt (e.g.): 5 constant testcon test 2*8+_testcon which should leave the answer on the stack. Reverse the order and the result should be the same, precedence. EVALUATE (s addr len -- n ) The topmost word of the parser. It returns the value of the expression, using the string non destructively. It initializes the pointer and counter to keep track of the current character to be examined, ie. the symbol. (EXPRESSION) The default for expression. This word corresponds to the first graph. Note that it is indirectly recursive, by calling the deferred word expression, which is itself. Also I have omitted stack pictures, in a recursive context it varies with the input. Doesn't matter as long as the end result is not variable. Program flow is almost a direct correspondence to the graph. SIMPLE-EX The next graph translates into this word. Also recursive, as the arrow around the graph implies. The order of operator precedence is bottom up; this is the second level from the bottom, the characters * and / are scanned for here. If not present, the routine is exited, with the result from TERM. TERM Recursive routine, looks for the logical & and ! symbols. It is expected that the expressions presented to this parser are not to deeply nested, or the stack might overflow? Normally, that will not be the case. Most calculators allow appr. 10 levels of nesting, and I think this one will outdo them. FACTOR Recursive also. Here I cheated a little, I don't parse the entire >> and <<. I simply ignore the second character. The second 'readsymbol' takes care of that. FACTOR1 The negate and not level. It is not recursive, simply decides on the character in symbol, what to do with factor2. The graph doesn't have an arrow back to itself. FACTOR2 Looks at the next symbol, and if it is a (, bracket that is, will call expression, and cause indirect recursion, by calling the entire sequence. Note how the missing ), closing bracket, is handled. If it is not a nesting, getconstant gets called. GETCONSTANT A constant can be a character, a number or an identifier. Each one has a special starting character, allowing for this parser not to have to backtrack. Letter? is in this version [ . | _ | a..z | A..Z ] IDENTIFIER Places the substring from the expression at (Forth's) 'here', appends a null character after it (required in my system) and finds it in the Forth context dictionary. If not found aborts with an error message, displaying the unknown identifier. If found, it will execute the word. Therefore if the word is a constant it will return a value on the stack. Any word returning a value could be used, even one with side effects. GETNUMBER Also places a substring at 'here', but calls 'number' to convert the string into a value. The number is in hex base if preceded by $. Again: ' 0 here count + c! ' is particular to my system, most other F83's will require a blank instead of a zero. GETCHAR Read the next symbol and uses it as a character constant. Since Forth already translates it for me, it is easy. It does expect a closing single quote and aborts complaining if it is missing. The first screen of the parser is a set of normal Forth words the explanation on the shadow screen is explicit enough. The only strange trick used here is not to point to the beginning of the string, but to one character past the end. This allows the length of the string to be decremented to zero, at which time the string is exhausted. This makes it unnecessary to keep the length in a variable and comparing a counter, which is incremented, to it. The current symbol is pointed to by symbol> - insymbol or the end of the string less the counter. On this screen I have also defined what a digit and what a letter should be. This parser can be extended, modified, to allow more levels and operators, or to change the meaning of identifiers. An example of how techniques from other languages can be used in Forth. And this parser can be used for people who cannot get the hang of RPN. I wrote it to input a bunch of text files written in another language. Peter Appelman.