NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / comp / graphics / 13452 < prev next >

Wrap

Internet Message Format | 1993-01-06 | 5.3 KB

Path: sparky!uunet!mcsun!sunic!dkuug!diku!torbenm From: torbenm@diku.dk (Torben AEgidius Mogensen) Newsgroups: comp.graphics Subject: Re: Fixed point sqrt anyone? Message-ID: <1993Jan6.110342.12779@odin.diku.dk> Date: 6 Jan 93 11:03:42 GMT References: <1idsq6INNced@flop.ENGR.ORST.EDU> Sender: torbenm@thor.diku.dk Organization: Department of Computer Science, U of Copenhagen Lines: 161 murrayk@prism.CS.ORST.EDU (the Dodger) writes: >Well? Does anyone know of a quick and dirty way to get a reasonably accurate >square root of a fixed point integer number? My numbers are stored in a 16.16 >bit format. I've heard of a few ways, but I'm open for ideas. I just want to >keep on avoiding floating point for ethical reasons. 8^) If you have room for a table with 2^16 32 bit words, you can get a good approximation using a table lookup, a division and a few simple operations. Let your number be a+b, where a is the integer part and b the fractional part. You then want a square root of the form x+y, where (x+y)^2 = a+b. Now, (x+y)^2 = x^2+2xy+y^2. If we assume x^2=a, then 2xy+y^2=b. If we assume y^2 is very small, we can approximate by saying 2xy=b, so y=b/2/x. We can make a table for the square root of a (which is a 16 bit integer). Thus we can find x by a single table lookup. Finding y then requires a right-shift and a division. The approximate square root is the x+y. Note that the assumption that y^2 is very small is only valid if a is large. If a is less than 2^8, you can shift the entire number 8 bits left, do the square root and then shift the result 4 bits right. If you want a more exact solution, the code below should do. If you have to implement division by a shift-and-compare loop, the final code takes time comparable to about 2 divisions, depending on your processor. On the ARM, the time is about 1.7 divisions. The description below is extracted from some earlier postings in comp.sys.acorn. Though the code is for ARM, it should be easy to modify to your favourite processor (if it is reasonably well-designed). The algorithm is for normalized numbers, but it should be relatively easy to modify to fixed point. If your number is of the form a*2^n, where 0.5 <= a < 1.0 then do the following: sqrt(a*2^(2n)) = sqrt(a)*2^n sqrt(a*2^(2n+1)) = sqrt(a/2)*2^(n+1) This reduces the problem to finding a square root of a number a, 0.25 <= a < 1.0. This can be done by this method, which should have a cost comparable to 2 or 3 divisions. { 0.25 <= a < 1.0 } x := 4*(a-0.25); r := 1; n := 1; while (n<number-of-bits) and (a<>0) do begin { invariant: r^2+x = 4^n*a, x<2*r+1 } n := n+1; t := 4*r+1; x := 4*x; r := 2*r; if x >= t then begin x := x-t; r := r+1 end end; r := r/2^n; { sqrt(a)-r < 2^-n } Note that only addition, subtractions and multiplications by 2 or 4 are used in each part of the loop. At the end a single n-bit shift is done. Note that r and t are integers, while x is a real. The (a<>0) test allows you to exit the loop if an exact root is found, but it can be omitted without causing error (so the loop can be unwound). We can reformulate the algorithm to keep x < 1.0 throughout the algorithm: { 0.25 <= a < 1.0 } x := (a-0.25)/2; r := 1; for n := 2 to number-of-bits do begin { invariant: r^2+2^n*x= 4^n*a, x<(2*r+1)*2^-n } t := 4*r+1; x := 2*x; r := 2*r; if x >= t/2^(n+1) then begin x := x-t/2^(n+1); r := r+1 end end; r := r/2^number-of-bits; { sqrt(a)-r < 2^-number-of-bits } We can actually get this better by avoiding the doubling of r in each iteration. This way r and t are no longer integers, but the code is better anyway: { 0.25 <= a < 1.0 } x := a-0.25; r := 0.5; for n := 2 to number-of-bits do begin { invariant: r^2+x/2^(n-2) = a, x<r+2^-(n+1) } t := r+2^-(n+1); x := 2*x if x >= t then begin x := x-t; r := r+2^-n end end; { sqrt(a)-r < 2^-number-of-bits } There are, however, still a small problem: x might be greater than 1 after the doubling. We can solve this by delaying the doubling: { 0.25 <= a < 1.0 } x := a-0.25; r := 0.5; for n := 2 to number-of-bits do begin { invariant: r^2+x/2^(n-2) = a, x<r+2^-(n+1) } t := r+2^-(n+1) if x >= t/2 then begin x := x-t/2; r := r+2^-n end; x := 2*x end; { sqrt(a)-r < 2^-number-of-bits } This is shown in ARM assembler below. The number of cycles required for a 32 bit square root are 154. Note the use of conditional moves and adds, which allows efficient unfolding of the loop. Note also the one-cycle shift-and-subtract operation. \ on entry Ra holds a \ on exit Rr holds sqrt(a) \ uses Rx and Rt (either can be equal to Ra) SUB Rx,Ra,#2^30 \ x := a-0.25; MOV Rr,#2^31 \ r := 0.5; ADD Rt,Rr,#2^29 \ t := r+2^-(n+1); SUBS Rt,Rx,Rt,LSR #1 \ if x >= t/2 then begin MOVHS Rx,Rt \ x := x-t/2; ADDHS Rr,Rr,#2^30 \ r := r+2^-n MOV Rx,Rx,ASL #1 \ x := 2*x; ... ADD Rt,Rr,#2^0 \ t := r+2^-(n+1); SUBS Rt,Rx,Rt,LSR #1 \ if x >= t/2 then begin MOVHS Rx,Rt \ x := x-t/2; ADDHS Rr,Rr,#2^1 \ r := r+2^-n MOV Rx,Rx,ASL #1 \ x := 2*x; \ t := r+2^-(n+1); \ { 2^-(n+1) is less than precision } SUBS Rt,Rx,Rr,LSR #1 \ if x >= t/2 then begin \ x := x-t/2; { x not needed } ADDHS Rr,Rr,#2^0 \ r := r+2^-n \ x := 2*x; { x not needed } \ at exit: 0.5 <= Rr < 1.0 when considered as a mantissa Torben Mogensen (torbenm@diku.dk)