No Fragments Archive 10: Diskmags

home *** CD-ROM | disk | FTP | other *** search

/ No Fragments Archive 10: Diskmags / nf_archive_10.iso / MAGS / CHOSNECK / CHOS2.ZIP / CHOSNECK.2ND / STUFF / DATAS.ZIP / ART48B.SCR < prev next >

Wrap

Text File | 2003-12-08 | 9.2 KB | 261 lines

<head> <title="...forever..."> <font=monaco10.fnt> <font=newy36.fnt> <font=time24.fnt> <image=back.raw w=256 h=256 t=-1> <buf=3832> <bgcolor=-1> <background=0> <link_color=253> <module=console.mod> <pal=back.pal> colors: 251 - black </head> <body> <frame x=0 y=0 w=640 h=3832 b=-1 c=-1> - dsp arithetic --------------------------------------------------------------- i'll just sum up a couple of situations that often occur. let's start with a small one: shifting with the dsp is a bit shitty. you're stuck with only shifting a bit at a time (1 bit in 2 cycles) which brings down performance to the good old 68000. only in cases of extreme nostalgia would we find this a preferable thing ;) rep #8 lsr a this shifts a down by 8. the 'rep' takes 4 and the 'lsr' takes 2*8, so a total of 22 cycles. this is where the weirdness of dsp starts. a multiply is actually faster than shifting: ; init move #>$008000,x0 ; x0=scalar, x1=value to be shifted mpy x0,x1,a ; a=value>>8 this takes only 6 (5?) cycles. and if you use it in a loop (x0 initialized) then only 2. let's say you want to scale a difference. res=s(v1-v0) where 's' is the scaler, 'v1' and 'v0' are end and start values respectively. you could do it like this: ; a=v1, x0=v0, y0=s sub x0,a ; a=dv=v1-v0 move a,x1 ; x1=dv mpy y0,x1,b ; b=s*dv=s(v1-v0) hhmm.. a straightforward implementation on most traditional processors. still this amounts to 6 cycles. let's try a more wacky approach: ; x1=v1, x0=v0, y0=s mpy +x1,y0,a ; a=s*v1 mac -x0,y0,a ; a=s*v1-s*v0=s(v1-v0) weirdness, by using only multiplications and macs you get a 4 cycle version. and this situation does occur in alot of algorithms. for example (bi)linear interpolation. finally, you might know the dsp suffers from a slow div. infact, a full division instruction isn't even available. the 'div' we speak of is only an iteration. and doing a full 24bits division with it is no picknick. andi #$FE,ccr rep #24 ; do 24 iterations div x0,a ; perform iteration. ; a1=rest, a0=quotient the 'rep' is 4 cycles.. the 'andi' takes 2. then 24*2=48 cycles additionally makes 54 cycles. certainly we can't live with that. sure you can do it faster. for instance we only want to divide 16bit numbers. ; init move #>$000080,x0 ; u/v, x1=u, y1=v mpy x0,x1,a ; a0=u<<8 andi #$FE,ccr rep #16 div y1,a ; a0=quotient we see 16 cycles are saved but 2 (or 5 with init) additional ones are taken, resulting in a total improvement of 14 cycles. sure, that's no big change, but it's a start. the real speedup lies in the following fact: u/v = u * (1/v) yeah, that's just a dumb way of representing it you think. indeed it is ;) but it's also a little push in the right direction. see the '*'? notice anything special about it? no? doh! this isn't working ;) the '*' ofcourse denotes multiplication and ofcourse this is a fast little operation on the dsp. but now you still have the '/' for the division. indeed, and we need to get rid of it, cos otherwise indeed you have found a silly way of writing down the same thing. how about a nice cup of precalc you say? and indeed why not. just make a nice table to index with your v. an inverse-table that is. just precalculate all your inverse into it and you're done. then assuming the inversetable is initialized: ; x0=u, x1=v move #inverse_table,r0 move x1,n0 ; n0=v nop move x:(r0+n0),x1 ; x1=1/v mpy x0,x1,a ; a=u*(1/v)=u/v ofcourse we can kill the nop by overlapping with another rout where possible. but you can see the idea. it's a whole lot faster. just one problem. the range of your 'v' is not endless. typicly coders have used anything from ranges 256 upto 4096. enough for some cases. but for fixedpoint precision it's shit. so what now? go back to div iterations for big numbers? surely that's not the ideal solution. and indeed it is not. at least not if you mind a small error in calculation. the trick is now to use multiple passes in through your small inverse table. let's explain this before we spill some code. we start here using decimals to keep it simple. let's say we only have a 10 word inverse table: 0: - 1: 1 2: .5 3: .333.. 4: .25 5: .2 6: .166.. 7: .142.. 8: .125 9: .111.. and now we want to calculate 1/22. surely, we can't look this up. for this trick to work we need to extend the table to twice it's range: 0: - 10: .1 1: 1 11: .0999.. 2: .5 12: .0833.. 3: .333.. 13: .0769.. 4: .25 14: .0714.. 5: .2 15: .0666.. 6: .166.. 16: .0625 7: .142.. 17: .0588.. 8: .125 18: .0555.. 9: .111.. 19: .0526.. we can easily see that by factorising the denominator we get: 1/22 = 1/2 * 1/11 but ofcourse, if you know a little about algebra that not all numbers can be factorised ;) so now we have to take a step back from exact calculation and go for an approximation. what we do is we split the denominator up into digits. we now have the upper digit '2' and the lower digit '2'. is we divide the denominator by the upper part: u=denom/upper = 22/2 = 11 we can now lookup it's inverse and we get 1/u=.1111.. we then also calculate the inverse of the upper digit: v=1/upper=1/2=.5 we then multiply u*v and get .05555.. ~= 1/22 hurrah! ofcourse taking 23 as the denominator will give a slight error! in this case: u=23/2 = 11.5 which will be rounded or truncated depending on your taste. please note rounding will require an extra entry in the inverse table! we just assume truncation here.. u=11 v=1/2=.5 u*v = .05555 > 1/23 ~= .04348 you can see you have a 20% deviation in this case which is undesirable. ofcourse when using 8bit radix instead of digits, errors quickly shrink to ~1%. here's the implementation i use for 15bit numbers: ; Calc 1/Z. ; y:(r2):inverse table (256 entries) ; y:(r3)=1/128 ; a=Z move a,x1 y:(r3),y1 ; x1=Z, y=1/128 mpyr x1,y1,a ; a=upper=Z>>7 move a,n2 ; n2=upper nop move y:(r2+n2),y1 ; y1=v=1/upper mpyr y1,x1,a ; a=u=Z/upper move a,n2 ; n2=u nop move y:(r2+n2),x1 ; x1= 1/u mpyr y1,x1,a ; a=u*v~=1/Z as you can see this takes a total of 20 cycles, which is a healthy improvement over the the 40 needed for the 16 bit division algorithm. also, please note that it can be combined (in parallel) with other code for efficiency. for perspectivation for instance this algorithm is definetely preferred. anyway, it's amazing i figured this out by myself. i even had to rethink the whole process when i saw the code again. anyway, i hope this makes clear that you can do some pretty silly stuff with the 56001. - short ----------------------------------------------------------------------- a short bit about short addressing modes. the 56001 provides alot of short modes. these are denoted by the '<'. As you might or might not know this stands for shifted up. You might know you can have 8bit short immediate data moves. move #<1,a ; a=$00.010000.000000 ! (is _not_ the same as) move #>1,a ; a=$00.000001.000000 ofcourse the <1 saves a program word and executes faster, but the result isn't the same. however, try this: move #<1,a1 ; a=$??.000001.?????? it saves a program word and some cycles just as '#<1,a', but doesn't shift it up! ofcourse you need to be careful with the contents of a2 and a0, but in some cases this can be solved in a quite simple fashion. ofcourse the address registers r0..r7 can use similar similar addressing modes. - conclusion ------------------------------------------------------------------ i provided alot of tips and tricks and hope i also guided some beginners round the potential pitfalls of dsp programming. as you might see there is alot more than meets the eye. but this is actually the strength of the dsp. it's limitations actually force you to optimise. that's the beauty of coding this little beast i think =) - final words ----------------------------------------------------------------- if you want to know more, a good place to check out is mikro.atari.org. this guy has tons of docs and also alot about falcon programming and it's dsp. also feel free to contact me at: pietervdmeer@netscape.net for any further questions. -- - --- -- ------------------------------------------------------------------- CHOSNECK 4th appearance contact us: done by the dream survivors greymsb@poczta.fm ----------------------------------------------------------------- -- - --- ---- </frame> </body>