home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!mcsun!sunic!dkuug!diku!torbenm
- From: torbenm@diku.dk (Torben AEgidius Mogensen)
- Newsgroups: comp.graphics
- Subject: Re: Fixed point sqrt anyone?
- Message-ID: <1993Jan6.110342.12779@odin.diku.dk>
- Date: 6 Jan 93 11:03:42 GMT
- References: <1idsq6INNced@flop.ENGR.ORST.EDU>
- Sender: torbenm@thor.diku.dk
- Organization: Department of Computer Science, U of Copenhagen
- Lines: 161
-
- murrayk@prism.CS.ORST.EDU (the Dodger) writes:
-
-
- >Well? Does anyone know of a quick and dirty way to get a reasonably accurate
- >square root of a fixed point integer number? My numbers are stored in a 16.16
- >bit format. I've heard of a few ways, but I'm open for ideas. I just want to
- >keep on avoiding floating point for ethical reasons. 8^)
-
- If you have room for a table with 2^16 32 bit words, you can get a
- good approximation using a table lookup, a division and a few simple
- operations.
-
- Let your number be a+b, where a is the integer part and b the
- fractional part. You then want a square root of the form x+y, where
- (x+y)^2 = a+b. Now, (x+y)^2 = x^2+2xy+y^2. If we assume x^2=a, then
- 2xy+y^2=b. If we assume y^2 is very small, we can approximate by
- saying 2xy=b, so y=b/2/x. We can make a table for the square root of a
- (which is a 16 bit integer). Thus we can find x by a single table
- lookup. Finding y then requires a right-shift and a division. The
- approximate square root is the x+y.
-
- Note that the assumption that y^2 is very small is only valid if a is
- large. If a is less than 2^8, you can shift the entire number 8 bits
- left, do the square root and then shift the result 4 bits right.
-
- If you want a more exact solution, the code below should do. If you
- have to implement division by a shift-and-compare loop, the final code
- takes time comparable to about 2 divisions, depending on your
- processor. On the ARM, the time is about 1.7 divisions.
-
- The description below is extracted from some earlier postings in
- comp.sys.acorn. Though the code is for ARM, it should be easy to
- modify to your favourite processor (if it is reasonably
- well-designed). The algorithm is for normalized numbers, but it should
- be relatively easy to modify to fixed point.
-
- If your number is of the form a*2^n, where 0.5 <= a < 1.0 then do the
- following:
-
- sqrt(a*2^(2n)) = sqrt(a)*2^n
- sqrt(a*2^(2n+1)) = sqrt(a/2)*2^(n+1)
-
- This reduces the problem to finding a square root of a number a,
- 0.25 <= a < 1.0. This can be done by this method, which should have
- a cost comparable to 2 or 3 divisions.
-
- { 0.25 <= a < 1.0 }
- x := 4*(a-0.25);
- r := 1;
- n := 1;
- while (n<number-of-bits) and (a<>0) do begin
- { invariant: r^2+x = 4^n*a, x<2*r+1 }
- n := n+1;
- t := 4*r+1;
- x := 4*x;
- r := 2*r;
- if x >= t then begin
- x := x-t;
- r := r+1
- end
- end;
- r := r/2^n;
- { sqrt(a)-r < 2^-n }
-
- Note that only addition, subtractions and multiplications by 2 or 4
- are used in each part of the loop. At the end a single n-bit shift is
- done. Note that r and t are integers, while x is a real.
-
- The (a<>0) test allows you to exit the loop if an exact root is found,
- but it can be omitted without causing error (so the loop can be
- unwound).
-
- We can reformulate the algorithm to keep x < 1.0 throughout the
- algorithm:
-
- { 0.25 <= a < 1.0 }
- x := (a-0.25)/2;
- r := 1;
- for n := 2 to number-of-bits do begin
- { invariant: r^2+2^n*x= 4^n*a, x<(2*r+1)*2^-n }
- t := 4*r+1;
- x := 2*x;
- r := 2*r;
- if x >= t/2^(n+1) then begin
- x := x-t/2^(n+1);
- r := r+1
- end
- end;
- r := r/2^number-of-bits;
- { sqrt(a)-r < 2^-number-of-bits }
-
- We can actually get this better by avoiding the doubling of r in
- each iteration. This way r and t are no longer integers, but the code
- is better anyway:
-
- { 0.25 <= a < 1.0 }
- x := a-0.25;
- r := 0.5;
- for n := 2 to number-of-bits do begin
- { invariant: r^2+x/2^(n-2) = a, x<r+2^-(n+1) }
- t := r+2^-(n+1);
- x := 2*x
- if x >= t then begin
- x := x-t;
- r := r+2^-n
- end
- end;
- { sqrt(a)-r < 2^-number-of-bits }
-
- There are, however, still a small problem: x might be greater than 1
- after the doubling. We can solve this by delaying the doubling:
-
- { 0.25 <= a < 1.0 }
- x := a-0.25;
- r := 0.5;
- for n := 2 to number-of-bits do begin
- { invariant: r^2+x/2^(n-2) = a, x<r+2^-(n+1) }
- t := r+2^-(n+1)
- if x >= t/2 then begin
- x := x-t/2;
- r := r+2^-n
- end;
- x := 2*x
- end;
- { sqrt(a)-r < 2^-number-of-bits }
-
- This is shown in ARM assembler below. The number of cycles required
- for a 32 bit square root are 154. Note the use of conditional moves
- and adds, which allows efficient unfolding of the loop. Note also the
- one-cycle shift-and-subtract operation.
-
- \ on entry Ra holds a
- \ on exit Rr holds sqrt(a)
- \ uses Rx and Rt (either can be equal to Ra)
- SUB Rx,Ra,#2^30 \ x := a-0.25;
- MOV Rr,#2^31 \ r := 0.5;
-
- ADD Rt,Rr,#2^29 \ t := r+2^-(n+1);
- SUBS Rt,Rx,Rt,LSR #1 \ if x >= t/2 then begin
- MOVHS Rx,Rt \ x := x-t/2;
- ADDHS Rr,Rr,#2^30 \ r := r+2^-n
- MOV Rx,Rx,ASL #1 \ x := 2*x;
-
- ...
-
- ADD Rt,Rr,#2^0 \ t := r+2^-(n+1);
- SUBS Rt,Rx,Rt,LSR #1 \ if x >= t/2 then begin
- MOVHS Rx,Rt \ x := x-t/2;
- ADDHS Rr,Rr,#2^1 \ r := r+2^-n
- MOV Rx,Rx,ASL #1 \ x := 2*x;
-
- \ t := r+2^-(n+1);
- \ { 2^-(n+1) is less than precision }
- SUBS Rt,Rx,Rr,LSR #1 \ if x >= t/2 then begin
- \ x := x-t/2; { x not needed }
- ADDHS Rr,Rr,#2^0 \ r := r+2^-n
- \ x := 2*x; { x not needed }
- \ at exit: 0.5 <= Rr < 1.0 when considered as a mantissa
-
-
- Torben Mogensen (torbenm@diku.dk)
-