home *** CD-ROM | disk | FTP | other *** search
- Path: nntp.teleport.com!sschaem
- From: sschaem@teleport.com (Stephan Schaem)
- Newsgroups: comp.sys.amiga.programmer
- Subject: Re: PPC compilers
- Date: 15 Feb 1996 22:23:53 GMT
- Organization: Teleport - Portland's Public Access (503) 220-1016
- Message-ID: <4g0bpp$gh1@maureen.teleport.com>
- References: <311e8ed0@lls.se> <38232439@kone.fipnet.fi>
- NNTP-Posting-Host: kelly.teleport.com
- X-Newsreader: TIN [version 1.2 PL2]
-
- Jyrki Saarinen (jsaarinen@kone.fipnet.fi) wrote:
-
- : > JS> move.w d2,d4 ;u<<8
- : > JS> move.l a0,d5 ;g<<8
- : > JS> move.b d2,d4 ;+v
- : > JS> move.b (a2,d4.w),d5 ;read texel
- : > JS> move.b (a3,d5.l),(a4)+ ;read pixel from the shading table
- : > JS> addx.l d3,d2 ;u+=ustep
- : > JS> addx.l d1,d0 ;v+=vstep
- : > JS> add.l a1,a0 ;g+=gstep
- : >
- : > JS> That should be about 30 68020/68030 cycles per pixel.
- : >
- : > JS> Now, how the hell this should be scheduled for maximum
- : > JS> performance on the 68040 and on the 68060? Anyone?
- : >
- : > For the 68040, the only thing to be done is to put multi-cycle
- : > instructions before complex ("multicycle") addresses, but there are no
- : >ásuch instructions here..
-
- : What such instructions are?
-
- : > For the 68060, try "reorder away" register dependancies. That is, try to
- : > make sure that two instructions can be executed in parallell. And in
- : > that aspect, nothing much can be done either.
- : >
- : > Summary: I don't think you can do much better than it is.
-
- : Let us try:
-
- : move.w d2,d4 ;u<<8
- : move.l a0,d5 ;g<<8
- : move.b d2,d4 ;+v
- : addx.l d3,d2 ;u+=ustep
- : move.b (a2,d4.w),d5 ;read texel
- : addx.l d1,d0 ;v+=vstep
- : move.b (a3,d5.l),(a4)+ ;read pixel from the shading table
- : add.l a1,a0 ;g+=gstep
-
- I think this is faster on all 680x0 and uses 2 less register:
- (6 free registers for the yloop, and is not bigger in byte size
- then your version)
-
- move.w d1,d2 ;set texture ypos
- addx.l d4,d1 ;step in texture
- move.b d0,d2 ;set texture xpos
- move.l d2,a0 ;texel address
- move.l a2,d2 ;lighting table
- addx.l d3,d0 ;step in texture
- move.b (a0),d2 ;Set texel in lighting table
- move.l d2,a0 ;Get lighting*texel adr in a usable reg
- move.b (a0),(a1)+ ; write lighted texel
- adda.l a3,a2 ; next light value
-
- : Is it any better now?
-
- Stephan
-