NetNews Usenet Archive 1992 #19

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #19 / NN_1992_19.iso / spool / comp / arch / 9227 < prev next >

Wrap

Text File | 1992-09-04 | 7.5 KB | 162 lines

Newsgroups: comp.arch Path: sparky!uunet!cis.ohio-state.edu!news.sei.cmu.edu!firth From: firth@sei.cmu.edu (Robert Firth) Subject: Re: Scientists as Programmers Message-ID: <1992Sep4.151001.9886@sei.cmu.edu> Organization: Software Engineering Institute References: <1992Aug31.195540.13074@ctr.columbia.edu> <1992Sep1.115849.13522@relay.nswc.navy.mil> <Btx4vF.Jx6@mentor.cc.purdue.edu> Date: Fri, 4 Sep 1992 15:10:01 GMT Lines: 151 In article <Btx4vF.Jx6@mentor.cc.purdue.edu> hrubin@pop.stat.purdue.edu (Herman Rubin) writes: >What is relevant is >the translating of the problem to be computed into the collection of bits >which the machine can understand to accomplish the task. This involves > > Knowing the algorithmic processes which can be applied, their > strengths and weaknesses, and what happens to them on the particular > machine when they are translated using the available languages. Whether scientific programmers need to understand the hardware is a debateable topic, but on the issue of fact that Dr Rubin raises, please let me add my firm support to his assertion that most programmers, scientific or otherwise, in fact do not understand the machinery beneath their feet. Allow me to relate a story from my own experience, which is perhaps of some relevance to the issue of machine and system architectures. The problem to be solved was this: you are writing a compiler for a language that has separately compiled units, which I'll call "modules". A module may declare types, procedures, variables &c. It may also include a body of executable code called "initialisation code". The rules of the language are: first, that all initialisation code must be executed, once only, before the "main program" runs; and, secondly, that if module A references or "imports" module B, then the initialisation code of B must be executed before that of A. Circular imports, of course, are forbidden. What is the appropriate codegeneration strategy to deal with this? Well, back in the bad old days, when programmers were amateurs trained in English literature, biochemistry or quantum mechanics (as was the case with the first programming team of which I was a member), we had a pretty simple-minded and straightforward way to do this. We would create a separate code section, something like .PSECT INIT,GBL,EXE,PIC,RO and decide that all initialisation code would go in that section. So, if a module had some initialistaion code, we'd spit out the .PSECT directive, then the code, and then restore the status quo ante. At link time, we'd inspect the import lists, carefully squirrelled away in auxiliary files, construct a total module order consistent with the partial order of the import graph, and proffer the object modules to the linker in that order. The linker would obligingly concatenate all the initialisation code into the one Psect; we'd then top and tail it with two tiny hand-written modules that put a procedure entry at the top and an exit at the bottom; and, somewhere in the startup code, we'd make sure there was a call to the init code before we entered the main program. Simple, and efficient. Note that this obeys the rules, ensures that all the init code is executed, once only and in the right order, and with the overhead of a single procedure call. Moreover, the code is contiguous, and if we have a clever system, we can mark that whole Psect as a cluster, to be paged as a unit, and even mark it to be resident on program load. After it's executed, of course, it will obligingly page itself out pretty quickly (or we can even do that by hand), and disappear from main store, ne'er to be found again. The way we arrived at this scheme was pretty simple, too: we worked backwards. First, visualise what you want to happen at run time; next, design the structures that must exist inside the machine to make it happen; finally, distribute the work of creating those structures among the compiler, codegenerator, linker, loader &c so that each does the right and appropriate thing. As I said, that was in the bad old days. Welcome to the worse new days, for here is how the same problem was solved by a team of trained professionals, working for a real company, writing a real, for-profit compiler that people were expected to pay for. I was there; I saw it. The initialisation code of a module was converted into a closed procedure procedure A_Init begin ...stuff... end and emitted along with the rest of the code in a single section. If the module A called module B, then the initialisation of B was invoked by procedure A_Init begin call B_Init ...stuff... end Of course, at this point, A didn't actually know whether B contained any initialisation code, but it called the procedure anyway. Hence, a module with no init code had to declare a procedure with a null body, just in case. Now, you are wondering, what if two modules import B, won't the init code be called twice? Yes, and this was solved by a Cunning Ruse: Init_Done: Boolean := False procedure A_Init begin if not Init_Done then Init_Done := True call B_Init ...stuff... end end So, if a module was imported 648 times, then the init code was called 648 times, and on the last 647 calls, it did nothing. This scheme is evidently correct, since - good grief Moriarty - the call graph of the Init_Code procedures is isomorphic to the import graph of the modules. But look at the cost: not one procedure call for each module with init code; not even one call per module, but one call for every ARC of the import graph! That's a lotta calls. Moreover, that code is scattered through the whole core image. That's a lotta page faults. And those tiny static Boolean variables Init_Done are also scattered through the data section, so double those page faults. Now, these people were not fools - they were PhD's in computer science and some of the brightest folk you could meet. And this was not a cheap and cheerful hack; it was supposed to be an industrial strength optimising compiler with register tweaking, loop scrunching, dead code eulogisation, you name it... So why, then, did they implement this simple feature in the most grotesquely inefficient way imaginable, with a design one would expect a developmentally-challenged tree sloth to be able to pick to pieces in ten seconds? I believe the answer is this: these people had between them about 30 years' eduucation in academic computer science. They knew about stacks, queues, lambda calculi, invariants, NP-completeness; they could use words like "isomorphic" over morning coffee. And, in all that time, they had never been taught to visualise *what really happens when real code runs on real machines*. This problem is pervasive. Each generation sees a new and more monstrous bloat in the size and cycle consumption of what I still think of as basic system software - device drivers, editors, compilers, linkers. And the reason the programmers are so profligate of resources is that they have never been taught that resources matter - they are told that, in computer science, "everything is virtual", and all resources are infinite and free. There is no technical perception of the machine as a real engine, with finite capability, and no aesthetic perception that economy of means is one of the cardinal qualities of good engineering. This must be changed. Or real companies, with real problems, will continue to hire english majors, biochemists and quantum mechanics to write their programs; because, as a senior member of one such company told me: "It is cheaper to train than to re-train." Robert Firth