NetNews Usenet Archive 1992 #16

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #16 / NN_1992_16.iso / spool / comp / lang / forth / 2818 < prev next >

Wrap

Text File | 1992-07-20 | 3.2 KB | 72 lines

Newsgroups: comp.lang.forth Path: sparky!uunet!email!mips.complang.tuwien.ac.at!anton From: anton@mips.complang.tuwien.ac.at (Anton Martin Ertl) Subject: Re: Non-Forth systems/languages. Message-ID: <1992Jul21.084657.3124@email.tuwien.ac.at> Sender: news@email.tuwien.ac.at Nntp-Posting-Host: mips.complang.tuwien.ac.at Organization: Institut fuer Computersprachen, Technische Universitaet Wien References: <3898.UUL1.3#5129@willett.pgh.pa.us> <1992Jul20.180116.5853@Informatik.TU-Muenchen.DE> Date: Tue, 21 Jul 1992 08:46:57 GMT Lines: 59 In article <1992Jul20.180116.5853@Informatik.TU-Muenchen.DE>, pazsan@Informatik.TU-Muenchen.DE (Bernd Paysan) writes: |> |> In article <3898.UUL1.3#5129@willett.pgh.pa.us>, ForthNet@willett.pgh.pa.us (ForthNet articles from GEnie) writes: |> |> Category 3, Topic 16 |> |> Message 84 Fri Jul 17, 1992 |> |> E.RATHER [Elizabeth] at 00:01 EDT |> |> |> |> In our experience w/ "funny chips" the way to really get performance out of a |> |> strange architecture is to design the Forth machine to take as much advantage |> |> of the architecture as possible internally. For example, on the i860 we use a |> |> modified dtc in which the 1st instruction of the "jumped to" code actually |> |> follows the jump in the calling def, because the pipelining enables that |> |> instruction to be executed "for free" during the jump. It's unclear whether |> |> writing the code in C as Ertl has done can facilitate that type of |> |> optimization. |> |> At least the code for HP-PA doesn't, but it doesn't run, either :-(. It compiles |> NEXT DTC as |> |> ldws,ma 4(0,%r20),%r19 |> bv 0(%r19) |> nop |> |> and adds a nop to the pipeline. The SPARC code added the increment of the pointer |> in the branch delay, but I don't have it here now, so I can't post it. The |> ldws,ma increments the pointer, so nothing was found to fit into the pipeline |> (and I can't find anything by hand, too). The NEXT ITC is quite the same: |> |> ldws,ma 4(0,%r20),%r19 |> ldw 0(0,%r19),%r19 |> bv 0(%r19) |> nop What you don't see is the delay slots between the load and the use of the loaded value in the next instruction. However, in a real forth primitive these delay slots will be filled (and the nop after the branch will be replaced) with other instructions (for computing and stack manipulation). This is done automatically by the C compiler. There is one possible optimization that I don't know how to express in GNU C. You can draw the load of the next execution token into the previous primitive, where it can fill the branch delay slot, i.e.: NEXT DTC bv 0(%r19) ldws,ma 4(0,%r20),%r19 This would reduce cache miss sensitivity and may reduce delay slots in some primitives. On the other hand, loading the next instruction will be wasted effort in at least 25% of the executed instructions (CALL and EXIT), so this "optimization" may backfire. It probably will pay off in processors of the next generation, which have many delay slots. So perhaps GNU C will give us another extension for expressing such things. - anton -- M. Anton Ertl Some things have to be seen to be believed anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen