NetNews Usenet Archive 1992 #26

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #26 / NN_1992_26.iso / spool / comp / arch / 10417 < prev next >

Wrap

Text File | 1992-11-05 | 3.6 KB | 85 lines

Newsgroups: comp.arch Path: sparky!uunet!stanford.edu!bcm!rice!cliffc From: cliffc@rice.edu (Cliff Click) Subject: Re: RTX and SC32 In-Reply-To: lamaster@pioneer.arc.nasa.gov's message of Wed, 4 Nov 1992 19:10:38 GMT Message-ID: <CLIFFC.92Nov5101357@miranda.rice.edu> Sender: news@rice.edu (News) Organization: Center for Research on Parallel Computations References: <17131@mindlink.bc.ca> <1992Nov4.191038.12063@news.arc.nasa.gov> Date: Thu, 5 Nov 1992 16:13:57 GMT Lines: 72 lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) writes: > So, the question remains: are there any Forth operations which are not > efficiently supported on the current crop of RISCs, and are on the > named Forth machines? If so, what operations are they, and why? > Could they be added to the current RISC architectures without major damage? I try to compare a popular RISC (Sparc) to a hypothetical Forth machine, sort of the cross section of Forth machines I've seen. Lets assume similar implementation technologies ('cause a Viking's going to blow the doors off any current Forth chip but it comes with 10x transistors). Ok, here goes: 1) Zero-cycle return The return is folded into the last arithmetic op of the subroutine. The instruction following the subroutine call executes on the very next cycle, no delay slot. My Sparc (very RISCy) takes 2 cycles: 1 for the "return" and in the delay slot a "restore". Note that getting the return parameter into the TOS slot is only slightly less difficult than getting the return parameter into the return-result register. Graph coloring register allocators win the day here. 2) 1 cycle call Calls take 1 cycle, no delay slot. On a Sparc, 1 for the call and 1 for the "save". 3) (Sparc only) Hardware support for the stack Window overflow/underflow penalties are similar to the hardware supported stack overflow/underflow mechanisms on Forth machines. However, the granularity on a Forth machine is much better (1 word vs 16) so that small recursive programs behave more reasonably. 3) Denser code Forth machines tend to have 8 or 16 bit opcodes, and you do NOT always need gobs of 'em to get the same job done. Personal experience suggests the code is denser by up to a factor of 2. Naturally the RISC guys can throw silicon at this problem: big I-cache and a fat path to memory. Many other things can remain essentially the same between the architectures. An 8 bit opcode can reference the top 16 stack elements, so 32 bits can do a "push r1", "push r2", "op", "pop r3" - basically imitate a 3-address instruction. And I've seen the chip which can do all 4 in 1 50Mhz cycle (really it was a 200Mhz internal clock). No Viking technology here, it was under 33,000 transistors. You can cache the top 16/256/whatever stack elements onchip. You can multi-port access to the top 16 stack elements. You can cache the top 16/whatever RETURN stack elements onchip as well. This is how the zero-cycle return works. Of course, you can add an I-cache and D-cache as well. But the built-in stacks basically amount to "caches" focused on serving a particular access pattern. And now a soapbox: Compiler technolgy is driven by computer archetectures. If a big-name company produced a blazing stack machine and handed a few out to CS departments doing compiler research, compiler researchers (like myself, hint, hint) would find the moral equivalent of the "graph coloring register allocator" for stack machines. End soapbox. Cliff -- Through a gentle rain / the moon throws silver shadows / and tears fall like pearls. Cliff Click (cliffc@cs.rice.edu)