home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.lang.forth
- Path: sparky!uunet!email!mips.complang.tuwien.ac.at!anton
- From: anton@mips.complang.tuwien.ac.at (Anton Martin Ertl)
- Subject: Re: Non-Forth systems/languages.
- Message-ID: <1992Jul21.084657.3124@email.tuwien.ac.at>
- Sender: news@email.tuwien.ac.at
- Nntp-Posting-Host: mips.complang.tuwien.ac.at
- Organization: Institut fuer Computersprachen, Technische Universitaet Wien
- References: <3898.UUL1.3#5129@willett.pgh.pa.us> <1992Jul20.180116.5853@Informatik.TU-Muenchen.DE>
- Date: Tue, 21 Jul 1992 08:46:57 GMT
- Lines: 59
-
- In article <1992Jul20.180116.5853@Informatik.TU-Muenchen.DE>, pazsan@Informatik.TU-Muenchen.DE (Bernd Paysan) writes:
- |>
- |> In article <3898.UUL1.3#5129@willett.pgh.pa.us>, ForthNet@willett.pgh.pa.us (ForthNet articles from GEnie) writes:
- |> |> Category 3, Topic 16
- |> |> Message 84 Fri Jul 17, 1992
- |> |> E.RATHER [Elizabeth] at 00:01 EDT
- |> |>
- |> |> In our experience w/ "funny chips" the way to really get performance out of a
- |> |> strange architecture is to design the Forth machine to take as much advantage
- |> |> of the architecture as possible internally. For example, on the i860 we use a
- |> |> modified dtc in which the 1st instruction of the "jumped to" code actually
- |> |> follows the jump in the calling def, because the pipelining enables that
- |> |> instruction to be executed "for free" during the jump. It's unclear whether
- |> |> writing the code in C as Ertl has done can facilitate that type of
- |> |> optimization.
- |>
- |> At least the code for HP-PA doesn't, but it doesn't run, either :-(. It compiles
- |> NEXT DTC as
- |>
- |> ldws,ma 4(0,%r20),%r19
- |> bv 0(%r19)
- |> nop
- |>
- |> and adds a nop to the pipeline. The SPARC code added the increment of the pointer
- |> in the branch delay, but I don't have it here now, so I can't post it. The
- |> ldws,ma increments the pointer, so nothing was found to fit into the pipeline
- |> (and I can't find anything by hand, too). The NEXT ITC is quite the same:
- |>
- |> ldws,ma 4(0,%r20),%r19
- |> ldw 0(0,%r19),%r19
- |> bv 0(%r19)
- |> nop
-
- What you don't see is the delay slots between the load and the use of
- the loaded value in the next instruction. However, in a real forth
- primitive these delay slots will be filled (and the nop after the
- branch will be replaced) with other instructions (for computing and
- stack manipulation). This is done automatically by the C compiler.
-
- There is one possible optimization that I don't know how to express in
- GNU C. You can draw the load of the next execution token into the
- previous primitive, where it can fill the branch delay slot, i.e.:
-
- NEXT DTC
- bv 0(%r19)
- ldws,ma 4(0,%r20),%r19
-
- This would reduce cache miss sensitivity and may reduce delay slots in
- some primitives. On the other hand, loading the next instruction
- will be wasted effort in at least 25% of the executed instructions
- (CALL and EXIT), so this "optimization" may backfire. It probably will
- pay off in processors of the next generation, which have many delay
- slots. So perhaps GNU C will give us another extension for expressing
- such things.
-
- - anton
- --
- M. Anton Ertl Some things have to be seen to be believed
- anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen
-