NetNews Offline 2

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Offline 2 / NetNews Offline Volume 2.iso / news / comp / sys / amiga / programmer / 3522 < prev next >

Wrap

Internet Message Format | 1996-08-05 | 3.9 KB

Path: nntp.teleport.com!sschaem From: sschaem@teleport.com (Stephan Schaem) Newsgroups: comp.sys.amiga.programmer Subject: Re: TMapping again! Date: 21 Feb 1996 14:35:35 GMT Organization: Teleport - Portland's Public Access (503) 220-1016 Message-ID: <4gfajn$7bd@maureen.teleport.com> References: <38232371@kone.fipnet.fi> <4fntd3$g56@sunsystem5.informatik.tu-muenchen.de> <38232442@kone.fipnet.fi> <4fvnjb$gdm@oreig.uji.es> <4g147q$sit@maureen.teleport.com> <4ga6lr$rp8@brachio.zrz.TU-Berlin.DE> <4gc5ur$brr@maureen.teleport.com> <4gcq6s$b1h@brachio.zrz.TU-Berlin.DE> NNTP-Posting-Host: linda.teleport.com X-Newsreader: TIN [version 1.2 PL2] Philipp Boerker (rawneiha@hydra.zrz.TU-Berlin.DE) wrote: : >: > I would unroll the innerloop, you probably can save hundreds of dbra : >: If you have an object with let's say 5000 polys umrolling will mean : >: a big overhead because of the treatment of "pixels modulo unrolling". : >: You may never enter that unrolled loop! : > What do you mean by "pixel modulo unrolling"? Never enter the loop?!?! : > then you dont draw any pixels so its VERY fast :) : if you do a innerloop mapping 16 pixels at once you need to map : number_of_pixels_you_have_to_map % 16 independently. For instants : 45 pixels to map would result in 2 executions of your 16 pixel loop : plus 13 executions of 1-pixel loop. number_pixel < 16 will not make use : of your loop at all! Then see the folowing... its not using a unrolling methode for alligned 'copy' like for flat shading inner loop unrolling. : > Its true that there is some overhead, But here is an example: : > ... : > moveq #%1111,d3 ;2 : > and.w d2,d3 ;2 : > neg.w d3 ;2 : > lsr.w #4,d2 ;4 : > jmp (.z,pc,d3.w*8) ;10 = 20 per scanline : > .. REPEAT 16 : ^^ what a great label :) It is! I love it, one reason why basm support it ;) Setup: .. move.l d0,(a1)+ dbra d1,.. .. move.l d2,(a1)+ dbra d3,.. rts Now, isn't this nice :) : > move.b (a0,d0.w),(a1) : > addx.l d1,d0 : > adda.l a2,a1 : > ENDR : >.z dbra d2,.. ;6 per pixel : > ... BTW, on a 030, you might want to freze the cache here. (if the function is called over and over... so the cache dont load data that will be executed once, then load the loop, then load data that will be execute once.. etc..) : Hm, this is a wall mapping loop, I thought we were talking about : polygon mapping, where you have lots of polys smaller than 16 pixels width : if you do *many* polys (therefor the number 5000). Ok, replace the innerloop by your favority mapping code. doens't matter I just wnated to ilustrate the methode of unrolling. : > now if this 'polygon' is more then 4 pixel height its worth it... well, : > if its a few pixel width too :) : ??? to use this loop you will have to have more than 16 pixels! : The treatment of number_pixels mod 16 will add a big overhead for small : polys, like I said above. Nope from ZERO to 65536.... and it unroll any number, upto 16. if the width is 0, no pixel will be mapped, the dbra wont branch, if the width its 3, it will map 3 pixel, the dbra wont branch... the bra will branch only if > 16 pixels are needed to be mapped. Also all the unroled loop wont be cached... only the number of pixel you accesed... : > If the poly where to be ~4x4 pixel I would think rendering then using : > the average color of the texture (precalculated of course) and use a : > flat shading function (using also the average of the light value at the : > triangle vertices : You will hardly stuff all that into your 256 B cache. According to my : experience it is better to have an ordinary loop and outer loops in cache : than spending the cache for case-optimized loops. I was just talking about switching rendering methode for tiny little things... You dont have too. But if I know a poly is 2 pixel height, why bother calculating texture steping and lighting etc. when its just a spec. on the screen. Or you can alway have a level of detail for you object acording to the z value, an even faster methode. Stephan