NetNews Usenet Archive 1992 #18

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #18 / NN_1992_18.iso / spool / comp / compiler / 1412 < prev next >

Wrap

Internet Message Format | 1992-08-19 | 14.5 KB

Path: sparky!uunet!charon.amdahl.com!pacbell.com!mips!darwin.sura.net!udel!gvls1!faatcrl!iecc!compilers-sender From: burley@geech.gnu.ai.mit.edu (Craig Burley) Newsgroups: comp.compilers Subject: Re: Why is compiled basic slower than C? (Basic is the future) Keywords: interpreter, performance, design Message-ID: <92-08-111@comp.compilers> Date: 18 Aug 92 16:40:56 GMT References: <92-08-042@comp.compilers> <92-08-095@comp.compilers> Sender: compilers-sender@iecc.cambridge.ma.us Reply-To: burley@geech.gnu.ai.mit.edu (Craig Burley) Organization: Free Software Foundation 545 Tech Square Cambridge, MA 02139 Lines: 254 Approved: compilers@iecc.cambridge.ma.us macrakis@osf.org (Stavros Macrakis) writes: burley@geech.gnu.ai.mit.edu (Craig Burley) writes: I think interpreted languages _could_ compile well if _lots_ of effort were expended. However, that kind of effort probably isn't worth it -- Could you please define `interpreted language'? Is it anything more than `a language whose first implementation was interpretive'? In the present discussion, the language under question is Basic, whose first (and only for a long time) implementation was in fact compiled. Other languages you might call `interpreted', like Lisp, Prolog, ML, and Snobol, in fact have very good compilers, which make major differences in runtime. I was using the apparent prevailing definition of "interpreted language" which I then subsequently broke down in a few (but not all) ways. The prevailing definition, to me, seemed to be "a language I believe is most often implemented as an interpreter", where "I" is whoever posted the original question. Meanwhile, as I think I also pointed out in my post, BASIC was indeed originally a compiled language, and the first implementation I ever used (and hacked myself) was compiled. (Note, however, that anyone starting out using that particular implementation of BASIC would likely assume it was _interpreted_, and their coding style sometimes changed in obvious ways once they learned otherwise. A lot of what I was addressing in my post had to do with how knowledge of the implementation can govern how code is written, and this can then affect how easy it is to build a different implementation that mproves performance for existing code.) the performance gains compared to the effort, that is -- when contrasted to the effort expended in buying new, faster processors, The compiler speedup is complementary to the hardware speedup. Agreed. Memory also is becoming much more plentiful, which generally improves the ease with which compilers are written more than it does for interpreters. Nevertheless, interpreters are still, to varying degrees for each language, easier to write than compilers. Most people experience BASIC as a personal computer implementation, essentially all of which have been interpreters. That's probably because building a half-decent compiler to run on a 4K machine is much harder than building a half-decent interpreter for that machine. making the interpreters run faster (often quite easy), Interpreters can, of course, be bummed (and often are). But in many cases, this still leaves you far from the compiled speed. Obviously. But writing the compiler is the trick. Simply writing _a_ compiler is not enough -- you must write _the_ compiler that produces output that consistently and greatly exceeds the performance of the fastest interpreter likely, since the fastest interpreter is probably much easier to build than even a half-decent compiler, and is probably more portable. translating programs to maintenable code in a language designed for compilation, and so on. This is a possible approach. However, you lose whatever advantages the `interpreted' language had, e.g. more appropriate abstractions, easier debugging, fast turnaround, etc. There are of course compiler-based systems that provide easy debugging and fast turnaround e.g. most Lisp compilers, but also the CenterLine C (formerly Saber C) system. Yes, but from what little I've seen, people who write code in BASIC rarely have a problem with the speed of the resulting program. When they do, they often are "ready" to move "up" to a language designed for high-speed execution via compilation, such as C, Pascal, or Fortran, if not Assembly. It would seem to make sense that there would be a large market for a top-quality compiler for "interpreted languages" (again, languages for which most users have access only to an interpreted implementation) such as BASIC, Hypertalk, and so on, and the existence of this market would cause such a compiler to be built. As far as I know, such compilers don't permeate the market the way compilers for C, Pascal, Fortran, or perhaps even LISP do. I can't say why that is, except to do the usual "punt to market forces". I remember a discussion with rms (GNU EMACS author) regarding TECO, the language in which he first wrote EMACS. TECO was (and still is, primarily) an interpreted language (well, it's an editor, kind of like the MS-DOS DEBUG program is a partition editor :-). Teco is a weird language. Most commands are single-character, and the main loop of the interpreter essentially dispatches directly to the relevant routine. So it is a sort of human-readable (well, this is debatable, too) byte-compiled form. Obviously it is more human-readable than an object file. One doesn't have to worry about endianness. :-) He told me about how someone we both knew had developed (or help develop) a compiler for TECO and was prepared to demonstrate its superiority in benchmarks. But submitting one simple case (something like deleting every other line in the file) proved the opposite. Teco is an extreme case, but even so, you could probably _slightly_ reduce runtime by compiling to machine code. I certainly agree that in most cases, it would not be worthwhile to try to compile Teco code.... Right, and this makes the likelihood of being able to fund development of a top-quality compiler for TECO more difficult. However, it is technically feasible, probably roughly as feasible as doing a top-quality BASIC compiler. Remember, the original question was "why is compiled BASIC slower than [compiled] C?" I'm saying the answer has a lot to do with the fact that there doesn't seem to be enough of a market for a top-quality BASIC compiler that might be able to challenge the superiority of C compilers. On the other hand, an OOP BASIC compiler might be able to make a good run at most C++ implementations. (Of course, TECO is probably like APL in that the time it takes to read in source code and figure out what it means is minimal, as compared to C, FORTRAN, and COBOL. So the penalty for being an interpreter could be seen as significantly lower for terse languages like TECO and APL, Teco is not just terse, it requires no lexing or parsing. Like APL, many of its operators are bulk operators and so most runtime is within their inner loops. But even APL has been compiled with good results.... Yes, and even TECO, BASIC, and all other languages _can_ be compiled with excellent results, perhaps even superior to what can be achieved via translation to C. I know, for instance, that a lot of Fortran code cannot be translated to C without losing some speed (unless the compilers globally optimize entire programs) and without using extensions to the language that I've rarely seen implemented. The question is, why aren't there lots of great compilers for BASIC? The answer seems to be "because there aren't lots of people demanding them". ...There is another major issue regarding interpretation which puts both BASIC and C in the "compiled" camp: whether the running program has a means to modify itself at run time. LISP, for example, belongs in the "interpreted" camp. This is not really a problem. The amount of Lisp code which actually modifies the program written by the programmer is just minuscule. Much more code _generates_ pieces of Lisp, typically using the `macro' feature. But almost all cases of macro usage are handled straightforwardly in Lisp compilers. There are a few cases where Lisp code generates code on the fly, then executes it. This is handled by having an interpreter loaded along with the compiled code, and assuring that code can work correctly in such a mixed environment. In fact, most Lisp implementations _assume_ a mixed environment.... I don't feel it is a _problem_, but it is, I think, a more important _issue_ in defining what constitutes an _interpreted_ vs. _compiled_ language. Ideally, one might say a compiled language is one where all the facilities needed by a program written in that language are identifiable by analyzing the source program; an interpreted language is one where all the facilities are available to the running program at all times. By "facilities" I mean things like library routines, language constructs, I/O capabilities, and so on. By this definition, for existence, one could say IBM's JCL is a fairly pure compiled language, at least as well as I understood it way back when. Your source program had to identify _all_ the input and output files (data sets) explicitly -- no runtime determination of which file to open and when to open it. This seems artificially constraining, but it's really great for some things that are "meta-language" issues, like scheduling of tasks (you don't get "deadly embrace" in an environment where multiple JCL jobs run at the same time like you might in a traditional interactive-shell- script-as-batch-job environment). On the other hand, LISP is a fairly pure interpreted language, since, while running, a program can read in or otherwise construct an entirely new program and run that. It also can examine and modify itself as needed. Fortran and C lean more toward the compiled side, Fortran-77 more so since its memory needs are statically determinable without global interprocedural analysis. But they both offer the ability to dynamically determine names of files to be opened and closed, and both offer dynamic determination of formatting for strings. Hence, any Fortran or C program that uses a run-time format or printf-type string, especially if that run-time string is pulled from an input stream, the compiler must ensure that _all_ facilities accessible via that mechanism are available at run time. In some ways, BASIC is even more of a compiled language than Fortran and C. (However, this is based on awfully old and probably faulty knowledge of the full scope of the language.) I don't recall that it has anything quite like _run-time_ formatting of strings. On the other hand, its memory needs, unlike Fortran, are not statically determinable. (The implementation I started out with just let arrays, even string arrays, grow as much as needed, like C can do via malloc.) Of course, a big determinant of how easy it is to build a compiler that produces excellent code is how extensive the built-in type mechanism is and how much it is used. C and even Fortran are pretty well endowed in this respect because they offer a fair number of frequently useful types and programmers tend to use those types appropriately. LISP, Smalltalk, and BASIC, to varying degrees, offer less of some types (varying kinds of integers and floats) and more of others (varying-length strings, dynamic and/or sparse arrays), and programmers in those languages might not always be as careful to use the most restrictive type minimally necessary to get the job done as they are with Fortran and C. So, the compiler's job must include the ability to determine, in place of the programmer, what more restricted type can be gotten away with to produce more efficient code. C and Fortran compilers generally don't have to do this to do what is widely considered to be a good job of compilation. A good BASIC compiler probably would. Practically speaking, lots of what makes a compiler or interpreter hard or easy to implement is not so much the theoretical things as the practical. C programmers have long designed and implemented code with the idea that memory is basically flat, pointers can be passed around freely, and so on. A C interpreter would have to deal with things like how to do ioctl() or other things where it might need to determine how the programmer expected memory to be laid out. That might be easy or hard depending on how the interpreter is designed and what other problems it is solving. A Fortran interpreter that executes as it reads code without first reading the entire procedure can't be built anyway, since a DATA statement that specifies initial values for variables can be at the end of a procedure. BASIC, on the other hand, ws pretty much designed for this kind of execution, from what I can remember -- it didn't, years ago, even have to worry about "separately compiled modules" (where the initial value for a global variable used by, say, the main program is contained in some other file or procedure). And it's actually easier in some ways to build a compiler for a structured language, as far as how to handle a GOTO from one block to another, as in: ... { ... goto foo; ... } ... { ... { ... { ... foo: ... } ... } ... } ... The compiler of code like this in ANSI C or Fortran-77 has much less to worry about than a canonical interpreter, the latter having to worry about playing around with stacked blocks and so on. This may seem obvious to most of the readers here, but 've seen at least one implementation of an interpreter where the designer seemed to have forgotten about cases like this (it was for a shell language much like PL/I), or at least the code didn't handle it right. Ultimately, interpreters and compilers seem to be one of those areas in which theory is fine, but practice is what governs both perceptions and realities. There don't seem to be adequate definitions of terms (but then I'm no theorist) nor summaries of key language facilities that distinguish between the various languages described (to which could be usefully added and researched: Forth, MUMPS, Eiffel, ADA...). I wonder how hard or easy (or useful) it would be to design a language that had facilities that met today's and tomorrow's projected needs, yet lent itself well to pure interpretation, pure global compilation, and other flavors in between the extremes? -- James Craig Burley, Software Craftsperson burley@gnu.ai.mit.edu -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.