<!-- Forthmacs Formatter generated HTML output -->
<html>
<head>
<title>Text Interpreter</title>
</head>
<body>
<h1>Text Interpreter</h1>
<hr>
<p>
This chapter describes the workings of the Risc-OS Forthmacs Text Interpreter
(the interpreter that compiles or executes Forth commands). The chapter
contrasts the Risc-OS Forthmacs Text Interpreter with a number of more
conventional implementations which have been used in other Forth systems.
<p>
<p>
<h2>FIG-Forth Text Interpreter</h2>
<p>
FIG-Forth used a variable <code><A href="_smal_AQ#2e0"> state </A></code> whose
value was 0 when interpreting and (hex) C0 when compiling. The interpreter was
coded as a single word <code><A href="_smal_BJ#231"> interpret </A></code>
which tested <code><A href="_smal_AQ#2e0"> state </A></code> to determine
whether to compile or to interpret. Here is the code:
<p>
<p><pre>
: INTERPRET ( -- )
BEGIN -FIND
IF STATE @ <
IF CFA , ELSE CFA EXECUTE THEN
ELSE HERE NUMBER DPL @ 1+
IF DROP [COMPILE] LITERAL
ELSE [COMPILE] DLITERAL
THEN
THEN ?STACK
AGAIN ;
</pre><p>
<p>
The <strong>state @ <</strong> phrase is pretty clever (or disgusting,
whatever way you wish to look at it). Since the value stored in <code><A href="_smal_AQ#2e0"> state </A></code>
is (hex) C0 when compiling, and since the length byte of a defined word (which
is left on the stack by <strong>-find</strong> ) is in the range (hex) 80-BF for
a non-immediate word and in the (hex) C0-FF for an immediate word, the <strong>state @ <</strong>
test manages to return <code><A href="_smal_AE#304"> true </A></code> only if
the <code><A href="_smal_AQ#2e0"> state </A></code> is compiling and the word
is not immediate. This fact is not crucial to our discussion but is included
here to avoid confusion.
<p>
<code><A href="_smal_AQ#2e0"> state </A></code> is explicitly tested once
inside this loop, but if you look at the code for the word <code><A href="_smal_AS#252"> literal </A>,</code>
it also tests <code><A href="_smal_AQ#2e0"> state </A></code> to decide whether
to compile the number or not.
<p>
To switch between compiling and interpreting, FIG-Forth uses two words [ and ].
[ is immediate and simply stores 0 into <code><A href="_smal_AQ#2e0"> state </A>.</code>
] is not immediate and stores (hex) C0 into <code><A href="_smal_AQ#2e0"> state </A>.</code>
Compilation is typically started with : , which is defined something like:
<p>
<p><pre>
: :
<some irrelevant stuff>
] ;code
<some assembly language stuff>
end-code
</pre><p>
<p>
The important point here is that when : executes to define a new word, the ]
just sets the <code><A href="_smal_AQ#2e0"> state </A></code> to compiling,
then the <code><A href="_smal_BN#115"> ;code </A></code> proceeds to execute.
(The purpose of <code><A href="_smal_BN#115"> ;code </A></code> is to patch the
code field of the word defined by : so that it does the appropriate thing for a
high-level forth word). The interpret word <code><A href="_smal_BJ#231"> interpret </A></code>
doesn't notice that <code><A href="_smal_AQ#2e0"> state </A></code> is now
compiling until the <code><A href="_smal_BN#115"> ;code </A></code> finishes.
<p>
So we see that [ and ] are fairly innocuous; they just change the value of a
variable.
<p>
<p>
<h2>Poly-FORTH Text Interpreter</h2>
<p>
Forth, Inc. decided that it would be preferable to have two separate loops for
the two separate functions of compiling and interpreting. The compiling loop
was called ], so ] actually executed the compile loop directly, rather than just
setting a variable. This has two subtle side effects.
<p>
If you loop at the previous definition of :, and now pretend that instead of
just setting a variable, ] actually executes the compiler loop, you will see
that the <code><A href="_smal_BN#115"> ;code </A></code> following it doesn't
actually get executed until after the compiling is finished. This in itself
doesn't cause a problem for :, but the use of ] inside programmer-defined words
sometimes caused unexpected behavior because stuff after the ] would get
executed after a bunch of stuff had been compiled.
<p>
The other subtlety relates to how the loops are terminated. Note that the <code><A href="_smal_BJ#231"> interpret </A></code>
loop shown above never terminates! We all know that it really does terminate,
and the mechanism is fairly kludgey. What happens is that there is a null
character at the end of every line of text in the input stream, and at the end
of every <code><A href="_smal_AV#165"> block </A></code> of text from mass
storage. The text interpreter picks up this null character just like a normal
word. The dictionary contains an entry which matches this "null word". The
associated code is executed, and it plays around with the return stack in such a
way that the <code><A href="_smal_BJ#231"> interpret </A></code> loop is exited
without its ever knowing about it.
<p>
The problem with the dual-loop interpreter/compiler is that the end of each line
of input from the input stream kicks out system out of whichever loop it was in.
If the user is attempting to compile a multi-line colon definition from the
input stream, he must start each line after the first with an explicit ],
because once the compiler loop is exited at the end of the first line, the
system doesn't remember that it was compiling.
<p>
One key thing to remember is that the compiler loop (which was named [) is
executed from within the interpreter loop.
<p>
<p>
<h2>Coroutines - Patton/Berkey</h2>
<p>
At FORML 83, Bob Berkey presented a paper about using coroutines for the
interpreter loop and the compiler loop, instead of having the compiler loop run
inside the interpreter loop. This means that executing ] kicks out the
interpreter loop and runs the compiler loop instead; similarly, executing [
kicks out the compiler loop and runs the interpreter loop instead. The
subroutine versions of these loops are present in his scheme, named <strong>compiler</strong>
and <strong>interpreter</strong> .
<p>
Bob feels that this scheme is more symmetrical than the Poly-FORTH approach and
that it eliminates some of the counter-intuitive behavior.
<p>
This scheme still requires that multi-line colon definitions compiled from the
keyboard have a ] at the beginning of each line after the first.
<p>
<p>
<h2>What is Wrong with all this</h2>
<p>
These different schemes do not at all address what I consider to be the
fundamental problems with the interpreter/compiler.
<p>
<p>
<h2>Fundamental Problem #1</h2>
<p>
The compiler/interpreter has a built-in infinite loop. This means that you
can't tell it to just compile one word; once you start it, off it goes, and it
won't stop until it gets to the end of the line or screen.
<p>
<p>
<h2>Fundamental Problem #2</h2>
<p>
The reading of the next word from the input stream is buried inside this loop.
This means that you can't hand a string representing a word to the
interpreter/compiler and have it interpret or compile it for you.
<p>
<p>
<h2>Fundamental Problem #3</h2>
<p>
The behavior of the interpreter/compiler is hard to change because all the
behavior is hard-wired into one or two relatively large words. Changing this
behavior can be extremely useful for a number of applications, for example
meta-compiling.
<p>
<p>
<h2>Fundamental Problem #4</h2>
<p>
If the interpreter/compiler can't figure out what to do with a word (it's not
defined and it's not a number), it aborts. Worse yet, the aborting is not done
directly from within the loop, but inside <code><A href="_smal_AQ#280"> number </A>.</code>
This severely limits the usefulness of <code><A href="_smal_AQ#280"> number </A></code>
because if the string that <code><A href="_smal_AQ#280"> number </A></code>
gets is not recognisable as a number, it will abort on you. (The 83 standard
punts this issue by not specifying <code><A href="_smal_AQ#280"> number </A>,</code>
except as an uncontrolled reference word).
<p>
<p>
<h2>Solution</h2>
<p>
As I see it, there are several distinct things that are going on inside the
interpreter/compiler. A proper factoring of the interpreter/compiler into
several words, each of which does one thing, solves all these problems.
<p>
The outermost word is the loop. The job of the loop is to repetitively get the
next word from the input stream and do something with it. The loop should
terminate when the input stream is exhausted.
<p>
<strong>Note:</strong> this will all be changed somewhat with ANSI-Forth coming
up soon.
<p>
<p><pre>
: interpret (S -- )
begin bl word ( str )
dup c@ ( str flag )
\ flag is 0 if the input stream is exhausted
while
"compile
repeat
drop
;
</pre><p>
<p>
The next level down is the "do something with the word". This ought to be a
separate word so that it may be called by other words which would like to
compile/interpret a single word. This layer is here called <code>"<A href="_smal_AN#9d"> compile, </A></code>
because it takes a string representing a single word and compiles (or
interprets) it. <code><A href="_smal_AR#71"> "compile </A>s</code> main job is
to decide what kind of word it is dealing with. There are 3 choices. Either
the word is already defined, or it is a literal (i.e. a number), or it is
neither.
<p>
<p><pre>
: "compile ( str -- )
canonical ?comp-local ?exit
find ( str 0 | cfa -1 | cfa 1 )
dup
if do-defined
else drop literal?
if double? if do-dliteral else drop do-literal then
else fliteral? if do-fliteral else do-undefined then
then
then ;
</pre><p>
<p>
Finally, at the lowest layer, there is the code which does the appropriate thing
for each of these three possibilities. This level is represented by the words <code><A href="_smal_AU#1c4"> do-defined </A>,</code> <code><A href="_smal_AX#1c7"> do-literal </A>,</code> <code><A href="_smal_AV#1c5"> do-dliteral </A>,</code>
DO-FLITERALand <code><A href="_smal_BA#1c8"> do-undefined </A>.</code> It is
only at this lowest layer that the system cares at all whether it is compiling
or interpreting. One of the benefits claimed for the Poly-FORTH scheme is
speed. This is due to the elimination of tests of the <code><A href="_smal_AQ#2e0"> state </A></code>
variable within the loop.
<p>
Clearly, my scheme has to do something to distinguish between compiling and
interpreting. An obvious solution would be to test <code><A href="_smal_AQ#2e0"> state </A></code>
inside each of <code><A href="_smal_AU#1c4"> do-defined </A>,</code> <code><A href="_smal_AX#1c7"> do-literal </A>,</code> <code><A href="_smal_AV#1c5"> do-dliteral </A>,</code>
DO-FLITERAL, and <code><A href="_smal_BA#1c8"> do-undefined </A>.</code> This
would slow down the system, of course.
<p>
A more interesting alternative is to <code><A href="_smal_AE#1b4"> defer </A></code>
each of the words <code><A href="_smal_AU#1c4"> do-defined </A>,</code> <code><A href="_smal_AX#1c7"> do-literal </A>,</code> <code><A href="_smal_AV#1c5"> do-dliteral </A>,</code>
DO-FLITERAL, and <code><A href="_smal_BA#1c8"> do-undefined </A>.</code>
(Deferred words are sometimes called execution vectors. Basically they are like
variables which hold the address of a word to execute, except that the @ <code><A href="_smal_AM#1ec"> execute </A></code>
is done automatically)
<p>
If these words are deferred, then they can be changed when the system switches
from compiling to interpreting, and vice versa.
<p>
<p><pre>
defer literal? ( str -- n true | d true | str false )
defer do-defined ( acf -1 | acf 1 -- ?? )
defer do-literal ( literal -- ?? )
defer do-dliteral ( dliteral -- ?? )
defer do-fliteral ( float -- ?? )
defer do-undefined ( str -- )
: (literal? ( str -- str false | dliteral true )
>r r@ number? ( d f )
if r> drop true
else 2drop r> false
then
;
' (literal? is literal?
: interpret-do-defined ( acf -1 | acf 1 -- ?? )
drop execute
;
: compile-do-defined ( acf -1 | acf 1 -- )
0> if execute ( if immediate )
else , ( if not immediate )
then
;
: interpret-do-literal ( n -- n ) ;
: compile-do-literal ( n -- ) postpone literal ;
: interpret-do-dliteral ( d -- d ) ;
: compile-do-dliteral ( d -- ) postpone 2literal ;
: interpret-do-fliteral ( f -- f ) ;
: compile-do-fliteral ( f -- ) postpone fliteral ;
: interpret-do-undefined ( str -- )
count type ." ?" cr
quit
;
: compile-do-undefined ( str -- )
count type ." ?" cr
postpone lose
;
then [ and ] would be defined as follows:
: [
['] interpret-do-defined is do-defined
['] interpret-do-literal is do-literal
['] interpret-do-dliteral is do-dliteral
['] interpret-do-fliteral is do-fliteral
['] interpret-do-undefined is do-undefined
state off
; immediate
: ]
['] compile-do-defined is do-defined
['] compile-do-literal is do-literal
['] compile-do-dliteral is do-dliteral
['] compile-do-fliteral is do-fliteral
['] compile-do-undefined is do-undefined
state on
;
</pre><p>
<p>
<code><A href="_smal_BP#237"> is </A></code> sets the word that a deferred word
actually executes.
<p>
Executing a deferred word does not need be slow. Deferred word are so useful
that they should be coded in assembler for speed. In Risc-OS Forthmacs they are
only very slightly slower than normal colon definitions.
<p>
So what?
<p>
This may seem to be more complicated than the schemes it replaces. It certainly
does have more words. On the other hand, each word is individually easy to
understand, and each word does a very specific job, in contrast to the old
style, which bundles up a lot of different things in one big word. The more
explicit factoring gives you a great deal of control over the interpreter.
<p>
Here are some interesting things you can do with this new scheme:
<p>
One of my favorite words, <code><A href="_smal_AJ#219"> h# </A></code> (for
Temporary Hex):
<p>
<p><pre>
: h# ( --word ?? )
base @ >r hex
blword "compile
r> base !
; immediate
</pre><p>
<p>
This word temporarily sets the base to hexadecimal, interprets a word, and
restores the base. It works for numbers or defined words, either interpreting
or compiling.
<p>
For example:
<p>
<p><pre>
decimal
th 10 . \ system prints--> 16
10 th . \ system prints--> a
: strip-parity ( char -- char-without-parity )
th 7f and
;
</pre><p>
<p>
Liberal use of this word markedly reduces the need to switch bases, especially
in source code, and thus reduces the chance of errors.
<p>
Here's a common word that is trivial to implement with this kind of interpreter:
<p>
<p><pre>
: ascii ( --name char )
bl word 1+ c@ ( char )
-1 dpl ! \ don't handle it as a double number
do-literal
;
</pre><p>
<p>
Here's a word which allows you to make a new name for an old word. It is a
smart word, in the sense when the new word is compiled, the old word will
actually be compiled instead, eliminating any performance penalty. Furthermore,
it even works for old words that are immediate! As you will see, the vectored <code><A href="_smal_AU#1c4"> do-defined </A></code>