NetNews Usenet Archive 1992 #26

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #26 / NN_1992_26.iso / spool / comp / arch / 10545 < prev next >

Wrap

Internet Message Format | 1992-11-08 | 5.2 KB

Xref: sparky comp.arch:10545 comp.lang.forth:3465 Newsgroups: comp.arch,comp.lang.forth Path: sparky!uunet!think.com!yale.edu!ira.uka.de!math.fu-berlin.de!informatik.tu-muenchen.de!pazsan From: pazsan@Informatik.TU-Muenchen.DE (Bernd Paysan) Subject: Re: What's RIGHT with stack machines References: <Bx5AIr.EAy.2@cs.cmu.edu> <1992Nov4.103008.2641@Informatik.TU-Muenchen.DE> <MIKE.92Nov9004026@guam.vlsivie.tuwien.ac.at> Originator: pazsan@hphalle5f.informatik.tu-muenchen.de Sender: news@Informatik.TU-Muenchen.DE (USENET Newssystem) Organization: Technische Universitaet Muenchen, Germany Date: Mon, 9 Nov 1992 15:53:48 GMT Message-ID: <1992Nov9.155348.6079@Informatik.TU-Muenchen.DE> Lines: 83 In article <MIKE.92Nov9004026@guam.vlsivie.tuwien.ac.at>, mike@vlsivie.tuwien.ac.at (Michael Gschwind) writes: |> In article <1992Nov4.103008.2641@Informatik.TU-Muenchen.DE> pazsan@Informatik.TU-Muenchen.DE (Bernd Paysan) writes: |> Thanks to your interresting report. I have little to add. |> |> Code size and performance: reduced code size for equal operations means less bus |> bandwidth is requested. As busses (and DRAMs) are today the von Neumann's |> bottlenecks, stack CPUs reduces costs for main memory and second level caches and |> may increase performance, if the costs are equal. |> |> Yes, but you pay a heavy price. Stack machines basically compute |> expressions of the type |> ((((...) op A) op B) op C) |> so each operation depends on the previous (if nothing else, since all |> ops operate on the stack, there is the stack pointer dependency), so |> super scalar or even deep pipelining are totally out of the question. |> I used to like stack machines, they are beautiful, simple, allow easy |> compilers, BUT THEY JUST DID NOT SCALE - neither in modern design |> technologies nor when it comes to compilers. |> Expressions like ((((...) op A) op B) op C) are quite typical for high level languages. I rarely see scaleable instructions in the source output of good C compilers, and if there are any, they are branches or load/stores. And these sort of instruction is scaleable in stack processors, too. And you can scale stack operations and arithmetic operations. And then we have the same amout of more than 70% collision free execution of 2 instructions at one time (even more: one or two stack ops, if they can be combined, one arithmetic, one load/store and a call or branch). Stack processors are certainly scaleable. Certainly you must have a sort of pipelining, then. |> Compiler complexity: Tanenbaum had a compiler project, that did much optimizing |> on the intermediate stack machine code. I think you have little to do to convert |> this code for a stack machine, whereas you have a lot of work to allocate |> registers and schedule instructions, as you have to do on "conventional" RISC |> processors. |> |> Yes, for simple arithmetic expressions, they allow way cool |> optimizations. But what do you do when you get to things like CSE? DUP |> DUP SWAP SWAP ROT? Hardly efficient! 2 memory accesses for DUP, 4 for |> swap, 6 for ROT. You simply have to shuffle intermediate results too |> much. Or save them in memory - definitely more expensive than a |> register. Once again, with technology of 10 years ago, they were nice, |> but it does pay to allocate registers and do scheduling, AND WE HAVE |> THE TECHNOLOGY NOW to do it. |> On modern stack machines like the SC32 you need NO memory acces for DUP, even if something is spilled, you may have one (less than 1%), and certainly NO memory access for SWAP and ROT. The typical situation (all over my system) is to have about one stack operation to one arithmetic or load/store operation. Call it one address machine, I call it combining stack ops and alu ops in one opcode. Even the first Forth processor (Novix 4000) did that. For highest performance you may pipeline stack and alu operation. |> Spill&Fill: did anyone thought about automatic spill&fill-buffers? (if the |> available stack cells decreases to a certain level, a number of stack items are |> load (4 or 8 would be great), and if it increases over a certain level, the same |> amount of stack items is spilled (have a hysterese of about 8 stack entries). |> |> I figure implementation of this is just to hairy - on the same real |> estate, you can put 16 extra registers ;) |> Really? It's not the same. Think: If you have 3-address opcodes, you waste 5*3=15 bits for addressing these registers, where one of 8 (3 bits) would be enough in many cases. The implementation of a stack in hardware is quite simple and you have no need for direct addressing these 16 extra "registers" for spill and fill (sequencial addressing is enough). |> mike |> -- |> |> Michael Gschwind, Dept. of VLSI-Design, Vienna University of Technology "Technology" sounds good, but TU=Technische Universitaet, it's a university of technics :-) (or would you translate "Elektro-Technik" into "Technology of electricity"?-) |> mike@vlsivie.tuwien.ac.at 1-2-3-4 kick the lawsuits out the door |> (currently somewhere in 5-6-7-8 innovate don't litigate |> the Bay Area, back sometime 9-A-B-C interfaces should be free |> at the end of this year) D-E-F-O look and feel has got to go! |> -- Bernd Paysan "Late answers are wrong answers!"