home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.arch
- Path: sparky!uunet!stanford.edu!bcm!rice!cliffc
- From: cliffc@rice.edu (Cliff Click)
- Subject: Re: RTX and SC32
- In-Reply-To: lamaster@pioneer.arc.nasa.gov's message of Wed, 4 Nov 1992 19:10:38 GMT
- Message-ID: <CLIFFC.92Nov5101357@miranda.rice.edu>
- Sender: news@rice.edu (News)
- Organization: Center for Research on Parallel Computations
- References: <17131@mindlink.bc.ca> <1992Nov4.191038.12063@news.arc.nasa.gov>
- Date: Thu, 5 Nov 1992 16:13:57 GMT
- Lines: 72
-
- lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) writes:
- > So, the question remains: are there any Forth operations which are not
- > efficiently supported on the current crop of RISCs, and are on the
- > named Forth machines? If so, what operations are they, and why?
- > Could they be added to the current RISC architectures without major damage?
-
- I try to compare a popular RISC (Sparc) to a hypothetical Forth machine,
- sort of the cross section of Forth machines I've seen. Lets assume similar
- implementation technologies ('cause a Viking's going to blow the doors off
- any current Forth chip but it comes with 10x transistors). Ok, here goes:
-
- 1) Zero-cycle return
-
- The return is folded into the last arithmetic op of the subroutine.
- The instruction following the subroutine call executes on the very next
- cycle, no delay slot. My Sparc (very RISCy) takes 2 cycles: 1 for
- the "return" and in the delay slot a "restore". Note that getting the
- return parameter into the TOS slot is only slightly less difficult than
- getting the return parameter into the return-result register. Graph
- coloring register allocators win the day here.
-
- 2) 1 cycle call
-
- Calls take 1 cycle, no delay slot. On a Sparc, 1 for the call and 1 for
- the "save".
-
- 3) (Sparc only) Hardware support for the stack
-
- Window overflow/underflow penalties are similar to the hardware supported
- stack overflow/underflow mechanisms on Forth machines. However, the
- granularity on a Forth machine is much better (1 word vs 16) so that
- small recursive programs behave more reasonably.
-
- 3) Denser code
-
- Forth machines tend to have 8 or 16 bit opcodes, and you do NOT always
- need gobs of 'em to get the same job done. Personal experience suggests
- the code is denser by up to a factor of 2. Naturally the RISC guys can
- throw silicon at this problem: big I-cache and a fat path to memory.
-
- Many other things can remain essentially the same between the architectures.
- An 8 bit opcode can reference the top 16 stack elements, so 32 bits can
- do a "push r1", "push r2", "op", "pop r3" - basically imitate a 3-address
- instruction. And I've seen the chip which can do all 4 in 1 50Mhz cycle
- (really it was a 200Mhz internal clock). No Viking technology here, it was
- under 33,000 transistors.
-
- You can cache the top 16/256/whatever stack elements onchip.
- You can multi-port access to the top 16 stack elements.
-
- You can cache the top 16/whatever RETURN stack elements onchip as well.
- This is how the zero-cycle return works.
-
- Of course, you can add an I-cache and D-cache as well. But the built-in
- stacks basically amount to "caches" focused on serving a particular access
- pattern.
-
- And now a soapbox:
-
- Compiler technolgy is driven by computer archetectures.
- If a big-name company produced a blazing stack machine and handed a few
- out to CS departments doing compiler research, compiler researchers
- (like myself, hint, hint) would find the moral equivalent of the
- "graph coloring register allocator" for stack machines.
-
- End soapbox.
-
-
- Cliff
- --
- Through a gentle rain / the moon throws silver shadows /
- and tears fall like pearls. Cliff Click (cliffc@cs.rice.edu)
-