As described previously, it is necessary to preparse arguments in order to achieve real performance gains. To support this, we introduce a type system into Tcl for parsing compile-time constants, and a system for determining which builtin commands expect which types for each argument.
Consider the `incr' command. In its first argument, it expects to be passed the name of a variable. In its second argument it (optionally) expects an integer value (a string which can be parsed as an integer). Thus, if the input contains the statement
incr a 5
we can know to treat ``a'' as the variable whose name is ``a'', not the string ``a''. Likewise, ``5'' here is the integer value 5. Note that if the ``5'' were replaced by ``$b'' for example, then we couldn't assume anything, except that the value of b might be a string which has an integer representation. Even this is not necessarily true, since Tcl supports exception handling, so this blunder may be intentional! The handling of these more difficult cases is discussed below in the runtime system, since these are dynamic effects that are orthognal to the problems encountered by the compiler.
It would be entirely possible to hardcode the compiler to recognize these types and the commands which use them. However, Tcl is extendable. This means that a user could potentially author his or her own builtin which takes as an argument a list or some other large structure, which we would like preparse. Thus, it makes more sense to allow the compiler itself to be extensible, so such power users can compile their own commands and arguments.
For this job, Tcl itself is ideal, and hence we embed an interpreter into the compiler. This interpreter reads in a config script at startup which declares data type support and associates them to builtin commands. The Tcl script contains the following two commands, in addition to the Tcl core. The backslahes indicate that the arguments should be part of a single command.
type <name> <parseproc> <codegenproc> \ <loadproc> <printproc>
This declares a new argument type, which will use parseproc to convert string data into parsed data of this type, codegenproc to output into binary form, loadproc to load at runtime, and printproc to converted back into string form for output. These procs are just names, which are associated statically by the C callbacks to real C function pointers taking specific arguments, using the Tcl hashing mechanism provided by the Tcl core library (Tcl_HashXxxx).
builtin <name> <parseproc> <codegenproc> \ <loadproc> <execproc> { { <type1> <argname1> } { <type2> <argname2> <default2> } ... } <return_type>
This declares a new command to be compiled specially, named <name>, with procs defined for its compilation and evaluation. Note that all but the execproc may be left empty (passed {} , which evaluates to the null string) because virtually all commands follow a standard style employing known types for each argument. When left empty, TC substitutes a default routine to process the builtin.
For each list element in the body of the builtin declaration, we specify what argument type should be passed to this command. A name is required for identification, debugging, and bookkeeping purposes. The optional third element is a default string for that argument. For example, the `incr' command is described as follows:
builtin incr {} {} {} exec_incr { { variable var } { integer i 1 } } integer
The ``1'' is preparsed as type integer and entered into the dictionary. If incr is called with only one argument, the runtime will substitute this default value for the second argument.