home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.arch:10545 comp.lang.forth:3465
- Newsgroups: comp.arch,comp.lang.forth
- Path: sparky!uunet!think.com!yale.edu!ira.uka.de!math.fu-berlin.de!informatik.tu-muenchen.de!pazsan
- From: pazsan@Informatik.TU-Muenchen.DE (Bernd Paysan)
- Subject: Re: What's RIGHT with stack machines
- References: <Bx5AIr.EAy.2@cs.cmu.edu> <1992Nov4.103008.2641@Informatik.TU-Muenchen.DE> <MIKE.92Nov9004026@guam.vlsivie.tuwien.ac.at>
- Originator: pazsan@hphalle5f.informatik.tu-muenchen.de
- Sender: news@Informatik.TU-Muenchen.DE (USENET Newssystem)
- Organization: Technische Universitaet Muenchen, Germany
- Date: Mon, 9 Nov 1992 15:53:48 GMT
- Message-ID: <1992Nov9.155348.6079@Informatik.TU-Muenchen.DE>
- Lines: 83
-
-
- In article <MIKE.92Nov9004026@guam.vlsivie.tuwien.ac.at>, mike@vlsivie.tuwien.ac.at (Michael Gschwind) writes:
- |> In article <1992Nov4.103008.2641@Informatik.TU-Muenchen.DE> pazsan@Informatik.TU-Muenchen.DE (Bernd Paysan) writes:
- |> Thanks to your interresting report. I have little to add.
- |>
- |> Code size and performance: reduced code size for equal operations means less bus
- |> bandwidth is requested. As busses (and DRAMs) are today the von Neumann's
- |> bottlenecks, stack CPUs reduces costs for main memory and second level caches and
- |> may increase performance, if the costs are equal.
- |>
- |> Yes, but you pay a heavy price. Stack machines basically compute
- |> expressions of the type
- |> ((((...) op A) op B) op C)
- |> so each operation depends on the previous (if nothing else, since all
- |> ops operate on the stack, there is the stack pointer dependency), so
- |> super scalar or even deep pipelining are totally out of the question.
- |> I used to like stack machines, they are beautiful, simple, allow easy
- |> compilers, BUT THEY JUST DID NOT SCALE - neither in modern design
- |> technologies nor when it comes to compilers.
- |>
- Expressions like ((((...) op A) op B) op C) are quite typical for high level
- languages. I rarely see scaleable instructions in the source output of good C
- compilers, and if there are any, they are branches or load/stores. And these sort
- of instruction is scaleable in stack processors, too. And you can scale stack
- operations and arithmetic operations. And then we have the same amout of more
- than 70% collision free execution of 2 instructions at one time (even more: one
- or two stack ops, if they can be combined, one arithmetic, one load/store and a
- call or branch). Stack processors are certainly scaleable. Certainly you must
- have a sort of pipelining, then.
-
- |> Compiler complexity: Tanenbaum had a compiler project, that did much optimizing
- |> on the intermediate stack machine code. I think you have little to do to convert
- |> this code for a stack machine, whereas you have a lot of work to allocate
- |> registers and schedule instructions, as you have to do on "conventional" RISC
- |> processors.
- |>
- |> Yes, for simple arithmetic expressions, they allow way cool
- |> optimizations. But what do you do when you get to things like CSE? DUP
- |> DUP SWAP SWAP ROT? Hardly efficient! 2 memory accesses for DUP, 4 for
- |> swap, 6 for ROT. You simply have to shuffle intermediate results too
- |> much. Or save them in memory - definitely more expensive than a
- |> register. Once again, with technology of 10 years ago, they were nice,
- |> but it does pay to allocate registers and do scheduling, AND WE HAVE
- |> THE TECHNOLOGY NOW to do it.
- |>
- On modern stack machines like the SC32 you need NO memory acces for DUP, even if
- something is spilled, you may have one (less than 1%), and certainly NO memory
- access for SWAP and ROT. The typical situation (all over my system) is to have
- about one stack operation to one arithmetic or load/store operation. Call it one
- address machine, I call it combining stack ops and alu ops in one opcode. Even
- the first Forth processor (Novix 4000) did that. For highest performance you may
- pipeline stack and alu operation.
-
- |> Spill&Fill: did anyone thought about automatic spill&fill-buffers? (if the
- |> available stack cells decreases to a certain level, a number of stack items are
- |> load (4 or 8 would be great), and if it increases over a certain level, the same
- |> amount of stack items is spilled (have a hysterese of about 8 stack entries).
- |>
- |> I figure implementation of this is just to hairy - on the same real
- |> estate, you can put 16 extra registers ;)
- |>
- Really? It's not the same. Think: If you have 3-address opcodes, you waste 5*3=15
- bits for addressing these registers, where one of 8 (3 bits) would be enough in
- many cases. The implementation of a stack in hardware is quite simple and you
- have no need for direct addressing these 16 extra "registers" for spill and fill
- (sequencial addressing is enough).
-
- |> mike
- |> --
- |>
- |> Michael Gschwind, Dept. of VLSI-Design, Vienna University of Technology
- "Technology" sounds good, but TU=Technische Universitaet, it's a university of
- technics :-) (or would you translate "Elektro-Technik" into "Technology of
- electricity"?-)
- |> mike@vlsivie.tuwien.ac.at 1-2-3-4 kick the lawsuits out the door
- |> (currently somewhere in 5-6-7-8 innovate don't litigate
- |> the Bay Area, back sometime 9-A-B-C interfaces should be free
- |> at the end of this year) D-E-F-O look and feel has got to go!
- |>
-
- --
- Bernd Paysan
- "Late answers are wrong answers!"
-