How evaluate.c works

Calculate() is just a front end for the evaluate() function. evaluate() is comprised of two main parts - a scanner (also known as a tokenizer) and a token evaluator.

The scanner turns a stream of characters (the expression string) into a stream of tokens.

  1. The first thing it does is allocate enough memory, where the worst case is that every character is an individual token. It now considers this as a 'list' of tokens, with a 'list pointer' pointing at the first unused token.
  2. We then perform a loop that traverses every character individually. We can make the loop pointer 'skip forward' for tokens that have more than one character.
  3. To translate characters into tokens very quickly, we have a 'scantable', where every possible character is given a token value. This token value is written into the 'current token'. But that's not enough to tokenize. So, we then make a few decisions based on what the token was.
  4. After a successful scan, we 'lace up' the tokens, using pointers to tokens, so we can easily insert other tokens without copying them about. We also null-terminate any variable names - we couldn't do this earlier as it would destroy the following token.
Now we've done the token scan, we must perform work on the tokens until they are fit for evaluation. Finally, we're ready for evaluation. We make two stacks - one for holding values and one for holding operators. Why? Well, we'll be using what's known as postfix evaluation, which is unambiguous. This requires that we stack all values as we see them then apply the operators in the correct order. The operators pull as many values as neccesary from the stack, then push the result back. Eventually, there's only one value on the stack - the result. However the token stream is in infix notation - what can we do? Convert one to the other. The idea of an infix to postfix converter is that the innermost operations get done first, then the outermost ones. This is done by running from left to right through the token stream and pushing operators to a stack. However, for binary operators, all higher-precedence operators already on the stack must be output to some destination stream before a new operator can go on. Similarly, a close bracket forces a 'flush' of the stack back to the nearest open bracket on it. The algorithim goes like so:
empty postfix list
FOR all tokens in infix list
  SELECT current token
  CASE variable, value
     append current token to postfix list
  ENDCASE

  CASE unary operator, open bracket
    push token onto stack
  ENDCASE

  CASE binary operator, close bracket
    WHILE precedence(token on top of stack) > precedence(current token)
      pull token from stack, append it to postfix list
    ENDWHILE

    IF current token = close bracket
      pull token from stack, cause error if it's not an open bracket
    ELSE
      push current token onto stack
    ENDIF
  ENDCASE
ENDFOR
However we've been a bit clever and instead of appending operators to a list, we execute them right away. And that's basically it, apart from a few decisions here and there about converting arguments to and from int and reals.

How do the 'stacks' work?

I don't use some ADT to operate the stacks, I use basic C operations. I have an array of the appropriate type which has enough entries to hold anything I want. I also have a 'stack pointer', which points at the current topmost item on the stack. It is initialised to -1, to say that the stack is empty. The following C fragments show the canonical stack operations:
valstk = the stack
vstk = the stack pointer

if (vstk < 0)           : if the stack is empty
if (vstk < n)           : if the stack has n items or less on it
valstk[vstk]            : get the item on top of the stack
valstk[++vstk] = item   : push an item onto the stack
valstk[vstk--]          : pull an item from the stack

Is there anything that could be better?


Kyzer/CSG