home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!charon.amdahl.com!pacbell.com!mips!darwin.sura.net!udel!gvls1!faatcrl!iecc!compilers-sender
- From: burley@geech.gnu.ai.mit.edu (Craig Burley)
- Newsgroups: comp.compilers
- Subject: Re: Why is compiled basic slower than C? (Basic is the future)
- Keywords: interpreter, performance, design
- Message-ID: <92-08-111@comp.compilers>
- Date: 18 Aug 92 16:40:56 GMT
- References: <92-08-042@comp.compilers> <92-08-095@comp.compilers>
- Sender: compilers-sender@iecc.cambridge.ma.us
- Reply-To: burley@geech.gnu.ai.mit.edu (Craig Burley)
- Organization: Free Software Foundation 545 Tech Square Cambridge, MA 02139
- Lines: 254
- Approved: compilers@iecc.cambridge.ma.us
-
- macrakis@osf.org (Stavros Macrakis) writes:
-
- burley@geech.gnu.ai.mit.edu (Craig Burley) writes:
-
- I think interpreted languages _could_ compile well if _lots_ of effort
- were expended. However, that kind of effort probably isn't worth it --
-
- Could you please define `interpreted language'? Is it anything more than
- `a language whose first implementation was interpretive'? In the present
- discussion, the language under question is Basic, whose first (and only
- for a long time) implementation was in fact compiled. Other languages you
- might call `interpreted', like Lisp, Prolog, ML, and Snobol, in fact have
- very good compilers, which make major differences in runtime.
-
- I was using the apparent prevailing definition of "interpreted language"
- which I then subsequently broke down in a few (but not all) ways. The
- prevailing definition, to me, seemed to be "a language I believe is most
- often implemented as an interpreter", where "I" is whoever posted the
- original question.
-
- Meanwhile, as I think I also pointed out in my post, BASIC was indeed
- originally a compiled language, and the first implementation I ever used
- (and hacked myself) was compiled. (Note, however, that anyone starting
- out using that particular implementation of BASIC would likely assume it
- was _interpreted_, and their coding style sometimes changed in obvious
- ways once they learned otherwise. A lot of what I was addressing in my
- post had to do with how knowledge of the implementation can govern how
- code is written, and this can then affect how easy it is to build a
- different implementation that mproves performance for existing code.)
-
- the performance gains compared to the effort, that is -- when contrasted
- to the effort expended in buying new, faster processors,
-
- The compiler speedup is complementary to the hardware speedup.
-
- Agreed. Memory also is becoming much more plentiful, which generally
- improves the ease with which compilers are written more than it does for
- interpreters. Nevertheless, interpreters are still, to varying degrees
- for each language, easier to write than compilers. Most people experience
- BASIC as a personal computer implementation, essentially all of which have
- been interpreters. That's probably because building a half-decent
- compiler to run on a 4K machine is much harder than building a half-decent
- interpreter for that machine.
-
- making the interpreters run faster (often quite easy),
-
- Interpreters can, of course, be bummed (and often are). But in many
- cases, this still leaves you far from the compiled speed.
-
- Obviously. But writing the compiler is the trick. Simply writing _a_
- compiler is not enough -- you must write _the_ compiler that produces
- output that consistently and greatly exceeds the performance of the
- fastest interpreter likely, since the fastest interpreter is probably much
- easier to build than even a half-decent compiler, and is probably more
- portable.
-
- translating programs to
- maintenable code in a language designed for compilation, and so on.
-
- This is a possible approach. However, you lose whatever advantages the
- `interpreted' language had, e.g. more appropriate abstractions, easier
- debugging, fast turnaround, etc. There are of course compiler-based
- systems that provide easy debugging and fast turnaround e.g. most Lisp
- compilers, but also the CenterLine C (formerly Saber C) system.
-
- Yes, but from what little I've seen, people who write code in BASIC rarely
- have a problem with the speed of the resulting program. When they do,
- they often are "ready" to move "up" to a language designed for high-speed
- execution via compilation, such as C, Pascal, or Fortran, if not Assembly.
-
- It would seem to make sense that there would be a large market for a
- top-quality compiler for "interpreted languages" (again, languages for
- which most users have access only to an interpreted implementation) such
- as BASIC, Hypertalk, and so on, and the existence of this market would
- cause such a compiler to be built. As far as I know, such compilers don't
- permeate the market the way compilers for C, Pascal, Fortran, or perhaps
- even LISP do. I can't say why that is, except to do the usual "punt to
- market forces".
-
- I remember a discussion with rms (GNU EMACS author) regarding TECO, the
- language in which he first wrote EMACS. TECO was (and still is,
- primarily) an interpreted language (well, it's an editor, kind of like
- the MS-DOS DEBUG program is a partition editor :-).
-
- Teco is a weird language. Most commands are single-character, and the
- main loop of the interpreter essentially dispatches directly to the
- relevant routine. So it is a sort of human-readable (well, this is
- debatable, too) byte-compiled form.
-
- Obviously it is more human-readable than an object file. One doesn't have
- to worry about endianness. :-)
-
- He told me about how someone we both knew had developed (or help
- develop) a compiler for TECO and was prepared to demonstrate its
- superiority in benchmarks. But submitting one simple case
- (something like deleting every other line in the file) proved the
- opposite.
-
- Teco is an extreme case, but even so, you could probably _slightly_ reduce
- runtime by compiling to machine code. I certainly agree that in most
- cases, it would not be worthwhile to try to compile Teco code....
-
- Right, and this makes the likelihood of being able to fund development of
- a top-quality compiler for TECO more difficult. However, it is
- technically feasible, probably roughly as feasible as doing a top-quality
- BASIC compiler.
-
- Remember, the original question was "why is compiled BASIC slower than
- [compiled] C?" I'm saying the answer has a lot to do with the fact that
- there doesn't seem to be enough of a market for a top-quality BASIC
- compiler that might be able to challenge the superiority of C compilers.
- On the other hand, an OOP BASIC compiler might be able to make a good run
- at most C++ implementations.
-
- (Of course, TECO is probably like APL in that the time
- it takes to read in source code and figure out what it means is
- minimal, as compared to C, FORTRAN, and COBOL. So the penalty for
- being an interpreter could be seen as significantly lower for terse
- languages like TECO and APL,
-
- Teco is not just terse, it requires no lexing or parsing. Like APL, many
- of its operators are bulk operators and so most runtime is within their
- inner loops. But even APL has been compiled with good results....
-
- Yes, and even TECO, BASIC, and all other languages _can_ be compiled with
- excellent results, perhaps even superior to what can be achieved via
- translation to C. I know, for instance, that a lot of Fortran code cannot
- be translated to C without losing some speed (unless the compilers
- globally optimize entire programs) and without using extensions to the
- language that I've rarely seen implemented.
-
- The question is, why aren't there lots of great compilers for BASIC? The
- answer seems to be "because there aren't lots of people demanding them".
-
- ...There is another major issue regarding interpretation which puts
- both BASIC and C in the "compiled" camp: whether the running
- program has a means to modify itself at run time. LISP, for
- example, belongs in the "interpreted" camp.
-
- This is not really a problem. The amount of Lisp code which actually
- modifies the program written by the programmer is just minuscule. Much
- more code _generates_ pieces of Lisp, typically using the `macro' feature.
- But almost all cases of macro usage are handled straightforwardly in Lisp
- compilers. There are a few cases where Lisp code generates code on the
- fly, then executes it. This is handled by having an interpreter loaded
- along with the compiled code, and assuring that code can work correctly in
- such a mixed environment. In fact, most Lisp implementations _assume_ a
- mixed environment....
-
- I don't feel it is a _problem_, but it is, I think, a more important
- _issue_ in defining what constitutes an _interpreted_ vs. _compiled_
- language. Ideally, one might say a compiled language is one where all the
- facilities needed by a program written in that language are identifiable
- by analyzing the source program; an interpreted language is one where all
- the facilities are available to the running program at all times. By
- "facilities" I mean things like library routines, language constructs, I/O
- capabilities, and so on.
-
- By this definition, for existence, one could say IBM's JCL is a fairly
- pure compiled language, at least as well as I understood it way back when.
- Your source program had to identify _all_ the input and output files (data
- sets) explicitly -- no runtime determination of which file to open and when
- to open it. This seems artificially constraining, but it's really great
- for some things that are "meta-language" issues, like scheduling of tasks
- (you don't get "deadly embrace" in an environment where multiple JCL jobs
- run at the same time like you might in a traditional interactive-shell-
- script-as-batch-job environment).
-
- On the other hand, LISP is a fairly pure interpreted language, since,
- while running, a program can read in or otherwise construct an entirely
- new program and run that. It also can examine and modify itself as
- needed.
-
- Fortran and C lean more toward the compiled side, Fortran-77 more so since
- its memory needs are statically determinable without global
- interprocedural analysis. But they both offer the ability to dynamically
- determine names of files to be opened and closed, and both offer dynamic
- determination of formatting for strings. Hence, any Fortran or C program
- that uses a run-time format or printf-type string, especially if that
- run-time string is pulled from an input stream, the compiler must ensure
- that _all_ facilities accessible via that mechanism are available at run
- time.
-
- In some ways, BASIC is even more of a compiled language than Fortran and
- C. (However, this is based on awfully old and probably faulty knowledge
- of the full scope of the language.) I don't recall that it has anything
- quite like _run-time_ formatting of strings. On the other hand, its
- memory needs, unlike Fortran, are not statically determinable. (The
- implementation I started out with just let arrays, even string arrays,
- grow as much as needed, like C can do via malloc.)
-
- Of course, a big determinant of how easy it is to build a compiler that
- produces excellent code is how extensive the built-in type mechanism is
- and how much it is used. C and even Fortran are pretty well endowed in
- this respect because they offer a fair number of frequently useful types
- and programmers tend to use those types appropriately. LISP, Smalltalk,
- and BASIC, to varying degrees, offer less of some types (varying kinds of
- integers and floats) and more of others (varying-length strings, dynamic
- and/or sparse arrays), and programmers in those languages might not always
- be as careful to use the most restrictive type minimally necessary to get
- the job done as they are with Fortran and C. So, the compiler's job must
- include the ability to determine, in place of the programmer, what more
- restricted type can be gotten away with to produce more efficient code. C
- and Fortran compilers generally don't have to do this to do what is widely
- considered to be a good job of compilation. A good BASIC compiler
- probably would.
-
- Practically speaking, lots of what makes a compiler or interpreter hard or
- easy to implement is not so much the theoretical things as the practical.
- C programmers have long designed and implemented code with the idea that
- memory is basically flat, pointers can be passed around freely, and so on.
- A C interpreter would have to deal with things like how to do ioctl() or
- other things where it might need to determine how the programmer expected
- memory to be laid out. That might be easy or hard depending on how the
- interpreter is designed and what other problems it is solving. A Fortran
- interpreter that executes as it reads code without first reading the
- entire procedure can't be built anyway, since a DATA statement that
- specifies initial values for variables can be at the end of a procedure.
- BASIC, on the other hand, ws pretty much designed for this kind of
- execution, from what I can remember -- it didn't, years ago, even have to
- worry about "separately compiled modules" (where the initial value for a
- global variable used by, say, the main program is contained in some other
- file or procedure).
-
- And it's actually easier in some ways to build a compiler for a structured
- language, as far as how to handle a GOTO from one block to another, as in:
-
- ... { ... goto foo; ... }
-
- ... { ... { ... { ... foo: ... } ... } ... } ...
-
- The compiler of code like this in ANSI C or Fortran-77 has much less to
- worry about than a canonical interpreter, the latter having to worry about
- playing around with stacked blocks and so on. This may seem obvious to
- most of the readers here, but 've seen at least one implementation of an
- interpreter where the designer seemed to have forgotten about cases like
- this (it was for a shell language much like PL/I), or at least the code
- didn't handle it right.
-
- Ultimately, interpreters and compilers seem to be one of those areas in
- which theory is fine, but practice is what governs both perceptions and
- realities. There don't seem to be adequate definitions of terms (but then
- I'm no theorist) nor summaries of key language facilities that distinguish
- between the various languages described (to which could be usefully added
- and researched: Forth, MUMPS, Eiffel, ADA...). I wonder how hard or easy
- (or useful) it would be to design a language that had facilities that met
- today's and tomorrow's projected needs, yet lent itself well to pure
- interpretation, pure global compilation, and other flavors in between the
- extremes?
- --
- James Craig Burley, Software Craftsperson burley@gnu.ai.mit.edu
- --
- Send compilers articles to compilers@iecc.cambridge.ma.us or
- {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
-