home *** CD-ROM | disk | FTP | other *** search
- Path: nntp.teleport.com!sschaem
- From: sschaem@teleport.com (Stephan Schaem)
- Newsgroups: comp.sys.amiga.programmer
- Subject: Re: TMapping again!
- Date: 21 Feb 1996 14:35:35 GMT
- Organization: Teleport - Portland's Public Access (503) 220-1016
- Message-ID: <4gfajn$7bd@maureen.teleport.com>
- References: <38232371@kone.fipnet.fi> <4fntd3$g56@sunsystem5.informatik.tu-muenchen.de> <38232442@kone.fipnet.fi> <4fvnjb$gdm@oreig.uji.es> <4g147q$sit@maureen.teleport.com> <4ga6lr$rp8@brachio.zrz.TU-Berlin.DE> <4gc5ur$brr@maureen.teleport.com> <4gcq6s$b1h@brachio.zrz.TU-Berlin.DE>
- NNTP-Posting-Host: linda.teleport.com
- X-Newsreader: TIN [version 1.2 PL2]
-
- Philipp Boerker (rawneiha@hydra.zrz.TU-Berlin.DE) wrote:
- : >: > I would unroll the innerloop, you probably can save hundreds of dbra
-
- : >: If you have an object with let's say 5000 polys umrolling will mean
- : >: a big overhead because of the treatment of "pixels modulo unrolling".
- : >: You may never enter that unrolled loop!
-
- : > What do you mean by "pixel modulo unrolling"? Never enter the loop?!?!
- : > then you dont draw any pixels so its VERY fast :)
-
- : if you do a innerloop mapping 16 pixels at once you need to map
- : number_of_pixels_you_have_to_map % 16 independently. For instants
- : 45 pixels to map would result in 2 executions of your 16 pixel loop
- : plus 13 executions of 1-pixel loop. number_pixel < 16 will not make use
- : of your loop at all!
-
- Then see the folowing... its not using a unrolling methode for
- alligned 'copy' like for flat shading inner loop unrolling.
-
- : > Its true that there is some overhead, But here is an example:
-
- : > ...
- : > moveq #%1111,d3 ;2
- : > and.w d2,d3 ;2
- : > neg.w d3 ;2
- : > lsr.w #4,d2 ;4
- : > jmp (.z,pc,d3.w*8) ;10 = 20 per scanline
- : > .. REPEAT 16
- : ^^ what a great label :)
-
- It is! I love it, one reason why basm support it ;)
-
- Setup:
- .. move.l d0,(a1)+
- dbra d1,..
- .. move.l d2,(a1)+
- dbra d3,..
- rts
-
- Now, isn't this nice :)
-
- : > move.b (a0,d0.w),(a1)
- : > addx.l d1,d0
- : > adda.l a2,a1
- : > ENDR
- : >.z dbra d2,.. ;6 per pixel
- : > ...
-
- BTW, on a 030, you might want to freze the cache here. (if the function
- is called over and over... so the cache dont load data that will be
- executed once, then load the loop, then load data that will be execute
- once.. etc..)
-
- : Hm, this is a wall mapping loop, I thought we were talking about
- : polygon mapping, where you have lots of polys smaller than 16 pixels width
- : if you do *many* polys (therefor the number 5000).
-
- Ok, replace the innerloop by your favority mapping code. doens't matter
- I just wnated to ilustrate the methode of unrolling.
-
- : > now if this 'polygon' is more then 4 pixel height its worth it... well,
- : > if its a few pixel width too :)
-
- : ??? to use this loop you will have to have more than 16 pixels!
- : The treatment of number_pixels mod 16 will add a big overhead for small
- : polys, like I said above.
-
- Nope from ZERO to 65536.... and it unroll any number, upto 16.
- if the width is 0, no pixel will be mapped, the dbra wont branch,
- if the width its 3, it will map 3 pixel, the dbra wont branch...
- the bra will branch only if > 16 pixels are needed to be mapped.
- Also all the unroled loop wont be cached... only the number of pixel
- you accesed...
-
- : > If the poly where to be ~4x4 pixel I would think rendering then using
- : > the average color of the texture (precalculated of course) and use a
- : > flat shading function (using also the average of the light value at the
- : > triangle vertices
-
- : You will hardly stuff all that into your 256 B cache. According to my
- : experience it is better to have an ordinary loop and outer loops in cache
- : than spending the cache for case-optimized loops.
-
- I was just talking about switching rendering methode for tiny little
- things... You dont have too. But if I know a poly is 2 pixel height, why
- bother calculating texture steping and lighting etc. when its just a spec.
- on the screen. Or you can alway have a level of detail for you object
- acording to the z value, an even faster methode.
-
- Stephan
-