home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.arch
- Path: sparky!uunet!cis.ohio-state.edu!news.sei.cmu.edu!firth
- From: firth@sei.cmu.edu (Robert Firth)
- Subject: Re: Scientists as Programmers
- Message-ID: <1992Sep4.151001.9886@sei.cmu.edu>
- Organization: Software Engineering Institute
- References: <1992Aug31.195540.13074@ctr.columbia.edu> <1992Sep1.115849.13522@relay.nswc.navy.mil> <Btx4vF.Jx6@mentor.cc.purdue.edu>
- Date: Fri, 4 Sep 1992 15:10:01 GMT
- Lines: 151
-
- In article <Btx4vF.Jx6@mentor.cc.purdue.edu> hrubin@pop.stat.purdue.edu (Herman Rubin) writes:
-
- >What is relevant is
- >the translating of the problem to be computed into the collection of bits
- >which the machine can understand to accomplish the task. This involves
- >
- > Knowing the algorithmic processes which can be applied, their
- > strengths and weaknesses, and what happens to them on the particular
- > machine when they are translated using the available languages.
-
-
- Whether scientific programmers need to understand the hardware is a
- debateable topic, but on the issue of fact that Dr Rubin raises, please
- let me add my firm support to his assertion that most programmers,
- scientific or otherwise, in fact do not understand the machinery
- beneath their feet. Allow me to relate a story from my own experience,
- which is perhaps of some relevance to the issue of machine and system
- architectures.
-
- The problem to be solved was this: you are writing a compiler for a
- language that has separately compiled units, which I'll call "modules".
- A module may declare types, procedures, variables &c. It may also
- include a body of executable code called "initialisation code". The
- rules of the language are: first, that all initialisation code must
- be executed, once only, before the "main program" runs; and, secondly,
- that if module A references or "imports" module B, then the initialisation
- code of B must be executed before that of A. Circular imports, of course,
- are forbidden.
-
- What is the appropriate codegeneration strategy to deal with this? Well,
- back in the bad old days, when programmers were amateurs trained in
- English literature, biochemistry or quantum mechanics (as was the case
- with the first programming team of which I was a member), we had a pretty
- simple-minded and straightforward way to do this. We would create a
- separate code section, something like
-
- .PSECT INIT,GBL,EXE,PIC,RO
-
- and decide that all initialisation code would go in that section. So, if
- a module had some initialistaion code, we'd spit out the .PSECT directive,
- then the code, and then restore the status quo ante.
-
- At link time, we'd inspect the import lists, carefully squirrelled away in
- auxiliary files, construct a total module order consistent with the partial
- order of the import graph, and proffer the object modules to the linker in
- that order. The linker would obligingly concatenate all the initialisation
- code into the one Psect; we'd then top and tail it with two tiny hand-written
- modules that put a procedure entry at the top and an exit at the bottom;
- and, somewhere in the startup code, we'd make sure there was a call to the
- init code before we entered the main program.
-
- Simple, and efficient. Note that this obeys the rules, ensures that all
- the init code is executed, once only and in the right order, and with the
- overhead of a single procedure call. Moreover, the code is contiguous,
- and if we have a clever system, we can mark that whole Psect as a cluster,
- to be paged as a unit, and even mark it to be resident on program load.
- After it's executed, of course, it will obligingly page itself out pretty
- quickly (or we can even do that by hand), and disappear from main store,
- ne'er to be found again.
-
- The way we arrived at this scheme was pretty simple, too: we worked
- backwards. First, visualise what you want to happen at run time; next,
- design the structures that must exist inside the machine to make it
- happen; finally, distribute the work of creating those structures among
- the compiler, codegenerator, linker, loader &c so that each does the
- right and appropriate thing.
-
- As I said, that was in the bad old days. Welcome to the worse new days,
- for here is how the same problem was solved by a team of trained
- professionals, working for a real company, writing a real, for-profit
- compiler that people were expected to pay for. I was there; I saw it.
-
- The initialisation code of a module was converted into a closed procedure
-
- procedure A_Init
- begin
- ...stuff...
- end
-
- and emitted along with the rest of the code in a single section. If the
- module A called module B, then the initialisation of B was invoked by
-
- procedure A_Init
- begin
- call B_Init
- ...stuff...
- end
-
- Of course, at this point, A didn't actually know whether B contained any
- initialisation code, but it called the procedure anyway. Hence, a module
- with no init code had to declare a procedure with a null body, just in
- case.
-
- Now, you are wondering, what if two modules import B, won't the init code
- be called twice? Yes, and this was solved by a Cunning Ruse:
-
- Init_Done: Boolean := False
-
- procedure A_Init
- begin
- if not Init_Done then
- Init_Done := True
- call B_Init
- ...stuff...
- end
- end
-
- So, if a module was imported 648 times, then the init code was called 648
- times, and on the last 647 calls, it did nothing.
-
- This scheme is evidently correct, since - good grief Moriarty - the call
- graph of the Init_Code procedures is isomorphic to the import graph of
- the modules. But look at the cost: not one procedure call for each
- module with init code; not even one call per module, but one call for
- every ARC of the import graph! That's a lotta calls. Moreover, that
- code is scattered through the whole core image. That's a lotta page
- faults. And those tiny static Boolean variables Init_Done are also
- scattered through the data section, so double those page faults.
-
- Now, these people were not fools - they were PhD's in computer science
- and some of the brightest folk you could meet. And this was not a cheap
- and cheerful hack; it was supposed to be an industrial strength optimising
- compiler with register tweaking, loop scrunching, dead code eulogisation,
- you name it... So why, then, did they implement this simple feature in
- the most grotesquely inefficient way imaginable, with a design one would
- expect a developmentally-challenged tree sloth to be able to pick to
- pieces in ten seconds?
-
- I believe the answer is this: these people had between them about 30 years'
- eduucation in academic computer science. They knew about stacks, queues,
- lambda calculi, invariants, NP-completeness; they could use words like
- "isomorphic" over morning coffee. And, in all that time, they had never
- been taught to visualise *what really happens when real code runs on
- real machines*.
-
- This problem is pervasive. Each generation sees a new and more monstrous
- bloat in the size and cycle consumption of what I still think of as basic
- system software - device drivers, editors, compilers, linkers. And the
- reason the programmers are so profligate of resources is that they have
- never been taught that resources matter - they are told that, in computer
- science, "everything is virtual", and all resources are infinite and free.
- There is no technical perception of the machine as a real engine, with
- finite capability, and no aesthetic perception that economy of means is
- one of the cardinal qualities of good engineering.
-
- This must be changed. Or real companies, with real problems, will continue
- to hire english majors, biochemists and quantum mechanics to write their
- programs; because, as a senior member of one such company told me: "It is
- cheaper to train than to re-train."
-
- Robert Firth
-