home *** CD-ROM | disk | FTP | other *** search
- <head>
- <title="...forever...">
- <font=monaco10.fnt>
- <font=newy36.fnt>
- <font=time24.fnt>
- <image=back.raw w=256 h=256 t=-1>
- <buf=3832>
- <bgcolor=-1>
- <background=0>
- <link_color=253>
- <module=console.mod>
- <pal=back.pal>
- colors:
- 251 - black
- </head>
- <body>
- <frame x=0 y=0 w=640 h=3832 b=-1 c=-1>
-
-
- - dsp arithetic ---------------------------------------------------------------
-
- i'll just sum up a couple of situations that often occur. let's start with a
- small one:
-
- shifting with the dsp is a bit shitty. you're stuck with only shifting a bit at
- a time (1 bit in 2 cycles) which brings down performance to the good old 68000.
- only in cases of extreme nostalgia would we find this a preferable thing ;)
-
- rep #8
- lsr a
-
- this shifts a down by 8. the 'rep' takes 4 and the 'lsr' takes 2*8, so a total
- of 22 cycles. this is where the weirdness of dsp starts. a multiply is actually
- faster than shifting:
-
- ; init
- move #>$008000,x0
- ; x0=scalar, x1=value to be shifted
- mpy x0,x1,a ; a=value>>8
-
- this takes only 6 (5?) cycles. and if you use it in a loop (x0 initialized)
- then only 2.
-
- let's say you want to scale a difference.
-
- res=s(v1-v0)
-
- where 's' is the scaler, 'v1' and 'v0' are end and start values respectively.
- you could do it like this:
-
- ; a=v1, x0=v0, y0=s
- sub x0,a ; a=dv=v1-v0
- move a,x1 ; x1=dv
- mpy y0,x1,b ; b=s*dv=s(v1-v0)
-
- hhmm.. a straightforward implementation on most traditional processors. still
- this amounts to 6 cycles. let's try a more wacky approach:
-
- ; x1=v1, x0=v0, y0=s
- mpy +x1,y0,a ; a=s*v1
- mac -x0,y0,a ; a=s*v1-s*v0=s(v1-v0)
-
- weirdness, by using only multiplications and macs you get a 4 cycle version.
- and this situation does occur in alot of algorithms. for example (bi)linear
- interpolation.
-
- finally, you might know the dsp suffers from a slow div. infact, a full
- division instruction isn't even available. the 'div' we speak of is only an
- iteration. and doing a full 24bits division with it is no picknick.
-
- andi #$FE,ccr
- rep #24 ; do 24 iterations
- div x0,a ; perform iteration.
- ; a1=rest, a0=quotient
-
- the 'rep' is 4 cycles.. the 'andi' takes 2. then 24*2=48 cycles additionally
- makes 54 cycles. certainly we can't live with that.
-
- sure you can do it faster. for instance we only want to divide 16bit numbers.
-
- ; init
- move #>$000080,x0
-
- ; u/v, x1=u, y1=v
- mpy x0,x1,a ; a0=u<<8
- andi #$FE,ccr
- rep #16
- div y1,a
- ; a0=quotient
-
- we see 16 cycles are saved but 2 (or 5 with init) additional ones are taken,
- resulting in a total improvement of 14 cycles. sure, that's no big change, but
- it's a start.
-
- the real speedup lies in the following fact:
-
- u/v = u * (1/v)
-
- yeah, that's just a dumb way of representing it you think. indeed it is ;) but
- it's also a little push in the right direction. see the '*'? notice anything
- special about it? no? doh! this isn't working ;)
-
- the '*' ofcourse denotes multiplication and ofcourse this is a fast little
- operation on the dsp. but now you still have the '/' for the division. indeed,
- and we need to get rid of it, cos otherwise indeed you have found a silly way
- of writing down the same thing.
-
- how about a nice cup of precalc you say? and indeed why not. just make a nice
- table to index with your v. an inverse-table that is. just precalculate all
- your inverse into it and you're done. then assuming the inversetable is
- initialized:
-
- ; x0=u, x1=v
- move #inverse_table,r0
- move x1,n0 ; n0=v
- nop
- move x:(r0+n0),x1 ; x1=1/v
- mpy x0,x1,a ; a=u*(1/v)=u/v
-
- ofcourse we can kill the nop by overlapping with another rout where possible.
- but you can see the idea. it's a whole lot faster. just one problem. the range
- of your 'v' is not endless. typicly coders have used anything from ranges 256
- upto 4096. enough for some cases. but for fixedpoint precision it's shit.
-
- so what now? go back to div iterations for big numbers? surely that's not the
- ideal solution. and indeed it is not. at least not if you mind a small error in
- calculation. the trick is now to use multiple passes in through your small
- inverse table.
-
- let's explain this before we spill some code. we start here using decimals to
- keep it simple. let's say we only have a 10 word inverse table:
-
- 0: -
- 1: 1
- 2: .5
- 3: .333..
- 4: .25
- 5: .2
- 6: .166..
- 7: .142..
- 8: .125
- 9: .111..
-
- and now we want to calculate 1/22. surely, we can't look this up. for this
- trick to work we need to extend the table to twice it's range:
-
- 0: - 10: .1
- 1: 1 11: .0999..
- 2: .5 12: .0833..
- 3: .333.. 13: .0769..
- 4: .25 14: .0714..
- 5: .2 15: .0666..
- 6: .166.. 16: .0625
- 7: .142.. 17: .0588..
- 8: .125 18: .0555..
- 9: .111.. 19: .0526..
-
- we can easily see that by factorising the denominator we get:
-
- 1/22 = 1/2 * 1/11
-
- but ofcourse, if you know a little about algebra that not all numbers can be
- factorised ;) so now we have to take a step back from exact calculation and go
- for an approximation.
-
- what we do is we split the denominator up into digits. we now have the upper
- digit '2' and the lower digit '2'. is we divide the denominator by the upper
- part:
-
- u=denom/upper = 22/2 = 11
-
- we can now lookup it's inverse and we get 1/u=.1111.. we then also calculate
- the inverse of the upper digit:
-
- v=1/upper=1/2=.5
-
- we then multiply u*v and get .05555.. ~= 1/22 hurrah!
-
- ofcourse taking 23 as the denominator will give a slight error! in this case:
-
- u=23/2 = 11.5 which will be rounded or truncated depending on your taste.
- please note rounding will require an extra entry in the inverse table!
- we just assume truncation here..
- u=11
- v=1/2=.5
- u*v = .05555 > 1/23 ~= .04348
-
- you can see you have a 20% deviation in this case which is undesirable.
- ofcourse when using 8bit radix instead of digits, errors quickly shrink to ~1%.
-
- here's the implementation i use for 15bit numbers:
-
- ; Calc 1/Z.
- ; y:(r2):inverse table (256 entries)
- ; y:(r3)=1/128
- ; a=Z
- move a,x1 y:(r3),y1 ; x1=Z, y=1/128
- mpyr x1,y1,a ; a=upper=Z>>7
- move a,n2 ; n2=upper
- nop
- move y:(r2+n2),y1 ; y1=v=1/upper
- mpyr y1,x1,a ; a=u=Z/upper
- move a,n2 ; n2=u
- nop
- move y:(r2+n2),x1 ; x1= 1/u
- mpyr y1,x1,a ; a=u*v~=1/Z
-
- as you can see this takes a total of 20 cycles, which is a healthy improvement
- over the the 40 needed for the 16 bit division algorithm. also, please note
- that it can be combined (in parallel) with other code for efficiency. for
- perspectivation for instance this algorithm is definetely preferred.
-
- anyway, it's amazing i figured this out by myself. i even had to rethink the
- whole process when i saw the code again. anyway, i hope this makes clear that
- you can do some pretty silly stuff with the 56001.
-
- - short -----------------------------------------------------------------------
-
- a short bit about short addressing modes. the 56001 provides alot of short
- modes. these are denoted by the '<'. As you might or might not know this stands
- for shifted up. You might know you can have 8bit short immediate data moves.
-
- move #<1,a ; a=$00.010000.000000 !
- (is _not_ the same as)
- move #>1,a ; a=$00.000001.000000
-
- ofcourse the <1 saves a program word and executes faster, but the result isn't
- the same. however, try this:
-
- move #<1,a1 ; a=$??.000001.??????
-
- it saves a program word and some cycles just as '#<1,a', but doesn't shift it
- up! ofcourse you need to be careful with the contents of a2 and a0, but in some
- cases this can be solved in a quite simple fashion.
-
- ofcourse the address registers r0..r7 can use similar similar addressing modes.
-
- - conclusion ------------------------------------------------------------------
-
- i provided alot of tips and tricks and hope i also guided some beginners round
- the potential pitfalls of dsp programming. as you might see there is alot more
- than meets the eye. but this is actually the strength of the dsp. it's
- limitations actually force you to optimise. that's the beauty of coding this
- little beast i think =)
-
- - final words -----------------------------------------------------------------
-
- if you want to know more, a good place to check out is mikro.atari.org. this
- guy has tons of docs and also alot about falcon programming and it's dsp. also
- feel free to contact me at: pietervdmeer@netscape.net for any further
- questions.
-
-
- -- - --- -- -------------------------------------------------------------------
- CHOSNECK 4th appearance contact us:
- done by the dream survivors greymsb@poczta.fm
- ----------------------------------------------------------------- -- - --- ----
- </frame>
- </body>
-
-