home *** CD-ROM | disk | FTP | other *** search
- Path: nntp.teleport.com!sschaem
- From: sschaem@teleport.com (Stephan Schaem)
- Newsgroups: comp.sys.amiga.programmer
- Subject: Re: TMapping again!
- Date: 5 Feb 1996 20:58:55 GMT
- Organization: Teleport - Portland's Public Access (503) 220-1016
- Message-ID: <4f5r2f$21f@maureen.teleport.com>
- References: <4d6v0t$3dt@maureen.teleport.com> <4dg4jk$km@news.cs.tu-berlin.de> <4dhvd5$5r2@maureen.teleport.com> <38232113@kone.fipnet.fi> <4e10ol$ck3@maureen.teleport.com> <4e2ku6$31m@news.cs.tu-berlin.de> <4eec27$pte@maureen.teleport.com> <4f4jof$h3b@news.cs.tu-berlin.de>
- NNTP-Posting-Host: linda.teleport.com
- X-Newsreader: TIN [version 1.2 PL2]
-
- Philipp Boerker (rawneiha@hydra.zrz.TU-Berlin.DE) wrote:
- : sschaem@teleport.com (Stephan Schaem) writes:
-
- : >Philipp Boerker (rawneiha@hydra.zrz.TU-Berlin.DE) wrote:
- : >: sschaem@teleport.com (Stephan Schaem) writes:
-
- : >: > repeat 8
- : >: > mw D1,D2
- : >: > mb D0,D2
- : >: > addx.l d7,D0
- : >: > movea.l d2,a0
- : >: > addx.l d6,D1
- : >: > mw (A0),d3
- : >: > mw D1,D2
- : >: > mb D0,D2
- : >: > movea.l d2,a0
- : >: > mb (A0),d3
- : >: > addx.l d7,D0
- : >: > addx.l d6,D1
- : >: > mw d3,(a1)+
- : >: > endr
-
-
- : >: I think mapping 2 pixels like you did is not optimal.
- : >: [...]
-
-
- : > 'proper' pipelining... or maximum overlape of bus and sequencer
- : > activity for my test is as above. I didn't count paper cycles,
- : > but saw my fps get improved when I do the above VS 2 move.b ,(a1)+
-
- : > (BTW notice the instruction register usage, and the ordering. should
- : > be optimal for a 060 and take the best advantage of overlap in the
- : > case of a 2 move.b to mem version)
-
- : The ordering can still be optimized for 060:
- : mw d1,d2 & mb d0,d2 have an data dependency. You could put one of the addx's
- : in between.
-
- Yes, that would be fine. I could switch the order of adition of X/Y
- so addx.l d6,d1 fit.
-
- : > I agree about doing word read can cross long boundary and require 2
- : > access... But if its a problem on other usage of the loop above
- : > Its so simple to make it write to (a1)+ vs d3.
-
- : Have you tried to do
- : mb (a0),d3
- : lsl.w #8,d3
-
- it modify x so cant use it, and its probably slower.
-
- something I thought would be faster then the above:
-
- REPEAT 16
- move.w (a2)+,d0 ; precalculated steping
- move.l d0,a0
- move.b (a0),(a1)+
- ENDR
-
- But actually its not.
-
- Stephan
-