NetNews Usenet Archive 1992 #26

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #26 / NN_1992_26.iso / spool / comp / arch / 10531 < prev next >

Wrap

Internet Message Format | 1992-11-08 | 3.2 KB

Xref: sparky comp.arch:10531 comp.lang.forth:3459 Newsgroups: comp.arch,comp.lang.forth Path: sparky!uunet!email!vlsivie.tuwien.ac.at!mike From: mike@vlsivie.tuwien.ac.at (Michael Gschwind) Subject: Re: What's RIGHT with stack machines In-Reply-To: pazsan@Informatik.TU-Muenchen.DE's message of Wed, 4 Nov 1992 10: 30:08 GMT Message-ID: <MIKE.92Nov9004026@guam.vlsivie.tuwien.ac.at> Sender: news@email.tuwien.ac.at Nntp-Posting-Host: guam.vlsivie.tuwien.ac.at Organization: Vienna University of Technology References: <Bx5AIr.EAy.2@cs.cmu.edu> <1992Nov4.103008.2641@Informatik.TU-Muenchen.DE> Date: Sun, 8 Nov 1992 23:40:26 GMT Lines: 50 In article <1992Nov4.103008.2641@Informatik.TU-Muenchen.DE> pazsan@Informatik.TU-Muenchen.DE (Bernd Paysan) writes: Thanks to your interresting report. I have little to add. Code size and performance: reduced code size for equal operations means less bus bandwidth is requested. As busses (and DRAMs) are today the von Neumann's bottlenecks, stack CPUs reduces costs for main memory and second level caches and may increase performance, if the costs are equal. Yes, but you pay a heavy price. Stack machines basically compute expressions of the type ((((...) op A) op B) op C) so each operation depends on the previous (if nothing else, since all ops operate on the stack, there is the stack pointer dependency), so super scalar or even deep pipelining are totally out of the question. I used to like stack machines, they are beautiful, simple, allow easy compilers, BUT THEY JUST DID NOT SCALE - neither in modern design technologies nor when it comes to compilers. Compiler complexity: Tanenbaum had a compiler project, that did much optimizing on the intermediate stack machine code. I think you have little to do to convert this code for a stack machine, whereas you have a lot of work to allocate registers and schedule instructions, as you have to do on "conventional" RISC processors. Yes, for simple arithmetic expressions, they allow way cool optimizations. But what do you do when you get to things like CSE? DUP DUP SWAP SWAP ROT? Hardly efficient! 2 memory accesses for DUP, 4 for swap, 6 for ROT. You simply have to shuffle intermediate results too much. Or save them in memory - definitely more expensive than a register. Once again, with technology of 10 years ago, they were nice, but it does pay to allocate registers and do scheduling, AND WE HAVE THE TECHNOLOGY NOW to do it. Spill&Fill: did anyone thought about automatic spill&fill-buffers? (if the available stack cells decreases to a certain level, a number of stack items are load (4 or 8 would be great), and if it increases over a certain level, the same amount of stack items is spilled (have a hysterese of about 8 stack entries). I figure implementation of this is just to hairy - on the same real estate, you can put 16 extra registers ;) mike -- Michael Gschwind, Dept. of VLSI-Design, Vienna University of Technology mike@vlsivie.tuwien.ac.at 1-2-3-4 kick the lawsuits out the door (currently somewhere in 5-6-7-8 innovate don't litigate the Bay Area, back sometime 9-A-B-C interfaces should be free at the end of this year) D-E-F-O look and feel has got to go!