Runtime Issues

As described previously, the TC runtime consists of an interpreter capable of reading in byte encoded Tcl scripts, preparsed as above. In order to support possible state changes such as command renaming, the Tcl core library is modified to update the state of the byte-code interpreter when these changes occur.

It is now appropriate to discuss the actual execution of commands in this new environment. In order to take advantage of the preparsing the compiler has done, we need a new callback interface. This is because the standard argc/argv interface defeats the purpose of preparsing by taking string as arguments.

This implies the need to modify the C callback routines for any commands which are to be compiled. This requirement is entirely reasonable. First, the new interface is very easy to construct from the argc/argv one for a given command. Second, the argc/argv interface is extremely slow due to its use of runtime parsing, and would require modification in any higher-performance system. Lastly, not builtins need to be compiled. Only those which are frequently called or which are passes large amounts of data impact the performance of the final application.

This model needs to be extended in order to minimize the parsing done at runtime, which still happens for non-static strings. First, we modify the data return system from using only strings (Tcl_AppendResult()) to using this parsed form of a typed data pointer when the new style of compiled callbacks are in use. This is needed so that the results of one command can be passed directly to another without reparsing the data return. An example follows:

  incr a [expr 4+5]

Assuming that both incr and expr are being compiled, we would like the return from expr to be the value 9, not the string ``9''. This could then be directly sent into incr without reparsing. The only way to avoid such reparsing is if expr doesn't convert its result into string format. Hence the need for this new style of return.

We also need to change the way that variables are stored in the Tcl interpreter. If we take the above example and modify it to read

  set b [expr 4+5]
  incr a $b

it becomes clear that if we disallow the value of ``b'' to be stored in parsed form, we cannot avoid reparsing it before its usage in incr. Thus, TC ``dual-ports'' its variables, storing both a string pointer as well as a compiled data value (the ``typed data field''). Now, if expr returns an integer, then ``b'' will store it in the typed data field, and invalidate the string field. incr can then be called directly with this parsed value. If ``b'' were of some other type, it would need to be converted to a string first, then back to an integer, as Tcl currently does implicitly.

If this seems overly complex, recall that Tcl is a typeless language. Lists, boolean expressions, integers and so on are not distinguished by Tcl until individual commands throw exceptions based on bad data. The following code with execute without error:

  proc cdr {list} { 
      return [lrange $list 1 end]
  }
  set i 2; set j 3; set k "3 4"
  linsert "$j $k" [cdr {1 2}] $i$j

The output of this command is ``3 3 23 4''. During the course of execution we have implicitly converted from string to integers, strings to lists, lists to integers, and integers to strings. While this is clearly not an example of good coding practice, it is legal Tcl input, and in many cases similar usages may appear in real source code. It is imperative that the TC runtime be able to perform these operations smoothly.

The ``dual-port'' implementation provides sufficient machinery to coerce data into parsed form, and keep it there as long as possible. The rule for statements is now simple: for each argument, convert the given data to the proper type, then call the compiled callback interface once the arguments are assembled. The rules for conversion amount to treating the two data fields as caches to the value. If the data is required in a specific type, and does not currently exist in that form, it is converted. If the destination type is a string, its value is stored in the string field; otherwise, the value is stored in the typed data field. The writing of a field is treated as follows:

Currently    
Valid Writer Action
string only string string form updated.
typed only string string form updated,
    typed form invalidated.
both string string form updated,
    typed form invalidated.
typed only typed typed form updated.
string only typed typed form updated,
    string form invalidated.
both typed typed form updated,
    string form invalidated.

An example is provided:

  set a 2
  set b [expr 4*$a]
  incr b
  puts stdout $b$a

When a is initially set to ``2'', its value becomes the string ``2'', rather than the integer value 2. This is because we cannot assume its usage as an integer later (and data lossage might result if we guessed wrong). Its usage in the expr call is without curly-braces, so again we cannot make assumptions about the compile-time parsing of ``4*$a'' (consider what happens if a is reset to the string ``4+5''). As grim as this seems, b still receives the integer value ``8'', because the call to expr returned the numeric value 8. Hence the call to incr requires no parsing. In the puts call of the last statement, b requires conversion to string form, but a does not.

To see the real benefits of this system, a more realistic example is presented, where we sum the first thousand integers (the hard way):

  set sum 0
  for {set counter 0} {$counter<1000} {
       incr counter} {
    incr sum $counter
  }

In the standard Tcl interpreter, this requires 6000 calls to hash the strings ``counter'', ``sum'' and ``incr''; the value of ``counter'' is parsed 3000 times and the value of ``sum'' 1000 times; and the comparison string is parsed 1000 times. In TC, no hashing occurs (it happens at load time), the values are parsed precisely once (during the first iteration, when it converted the values to integers), and the comparison string was preparsed, so no effort was required at runtime.