home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.arch:10960 comp.lang.misc:3814
- Newsgroups: comp.arch,comp.lang.misc
- Path: sparky!uunet!sun-barr!cs.utexas.edu!uwm.edu!zaphod.mps.ohio-state.edu!sol.ctr.columbia.edu!ira.uka.de!uni-heidelberg!clio!dentzer
- From: dentzer@clio.iwr.uni-heidelberg.de (Ralf Dentzer)
- Subject: Re: how to advocate new software/hardware features (Re: Hardware Support for Numeric Algorithms)
- Message-ID: <1992Nov20.135747.4642@sun0.urz.uni-heidelberg.de>
- Sender: news@sun0.urz.uni-heidelberg.de (NetNews)
- Organization: IWR, University of Heidelberg, Germany
- References: <Bxr8vG.IpI@mentor.cc.purdue.edu> <martin.721995695@bert> <martin.722176263@bert>
- Date: Fri, 20 Nov 92 13:57:47 GMT
- Lines: 161
-
- As some sort of compromise between portability and efficiency I would make
- the following proposal for a set of basic functions to deal with multiple
- digit integers. The main restriction for them is the imposition to have a
- C-function interface, but only thus portability can be guaranteed.
-
- Once the C-interface and a portable C implementation is established you can
- feel free to optimize -- use dirty programming tricks or a compiler with
- special features or finally use Assembler to exploit all the wonderful
- features of your processor.
-
- I would highly appreciate an optimized library of these functions
- accompaning every C-compiler or development kit I use. As the functions
- are very simple and similar to each other this is an easy task for
- someone acquainted with the characteristics of the processor and its
- assembler language.
-
- First there should be a data type "Digit_t" e.g., usually unsigned long
- or what else fits into one register, and a constant BitsPerDigit.
- Then come six functions for calculating with Digits, these provide
- access to carrys, the higher digit of a product and a division
- with two digit dividend:
-
- Digit_t DigitAdd(Digit_t *sum, Digit_t a, Digit_t b, Digit_t carry);
- /* *sum=LOW-DIGIT(a+b+carry);
- return HIGH-DIGIT(a+b+carry);
- */
- Digit_t DigitSub(Digit_t *diff, Digit_t a, Digit_t b, Digit_t carry);
- /* *diff=RESULT;
- return CARRY;
- where RESULT, CARRY are defined by:
- 2^BitsPerDigit >
- CARRY*2^BitsPerDigit + a - b - carry == RESULT >= 0
- */
- Digit_t DigitMult(Digit_t *prod, Digit_t a, Digit_t b, Digit_t carry);
- /* *prod=LOW-DIGIT(a*b+carry);
- return HIGH-DIGIT(a*b+carry);
- */
- Digit_t DigitMultAdd(Digit_t *res, Digit_t a, Digit_t b, Digit_t carry);
- /* *res=LOW-DIGIT(a*b+carry+*prod);
- return HIGH-DIGIT(a*b+carry+*prod);
- */
- Digit_t DigitMultSub(Digit_t *res, Digit_t a, Digit_t b, Digit_t carry);
- /* *res=RESULT;
- return RESULT;
- where RESULT, CARRY are defined by:
- 2^BitsPerDigit >
- CARRY*2^BitsPerDigit + *prod - a*b - carry == RESULT >= 0
- */
- Digit_t DigitDiv(Digit_t *quot, Digit_t h, Digit_t l, Digit_t d);
- /* Suppose: d>0 and h<d
- *quot=QUOT;
- return REM;
- where QUOT, REM are defined by:
- h*2^BitsPerDigit + l == d*QUOT + REM,
- 2^BitsPerDigit > REM >= 0
- */
-
- These functions are sufficient to build a multiple digit arithmetic,
- but due to function call overhead they are too slow.
- (More or less, depending on the processor; on a Sparc their
- performance is not that bad.)
- To get rid of this overhead we have to make an assumption about the
- way multiple digit integers are represented. As e.g. Geddes, Czapor and
- Labahn pointed out in their book "Algorithms for Computer Algebra"
- a representation as a vector (array) of digits is the most efficient
- way and should be preferred over e.g. linked lists.
-
- So we use the following functions to deal with vectors of l digits.
-
- Digit_t DigitVecAdd(Digit_t *sum, Digit_t *a, Digit_t *b, int l)
- /* sum[0..l-1] = a[0..l-1] + b[0..l-1]; return CARRY; */
- /* This notation means:
- Add the integers represented by the vectors a and b,
- write the lowest l digits of the result into the vector sum
- and return the carry of this operation.
- */
- Digit_t DigitVecSub(Digit_t *diff, Digit_t *a, Digit_t *b, int l)
- /* diff[0..l-1] = a[0..l-1] - b[0..l-1]; return CARRY; */
-
- I am not sure about having DigitVecAdd or the following function
- DigitVecAddCarry or both. DigitVecAdd is simpler but I can imagine
- situations where DigitVecAddCarry is more useful.
-
- Digit_t
- DigitVecAddCarry(Digit_t *sum, Digit_t *a, Digit_t *b, Digit_t carry, int l)
- /* sum[0..l-1] = a[0..l-1] + b[0..l-1] + carry; return CARRY; */
- Digit_t
- DigitVecSubCarry(Digit_t *diff, Digit_t *a, Digit_t *b, Digit_t carry, int l)
- /* diff[0..l-1] = a[0..l-1] - b[0..l-1] - carry; return CARRY; */
-
- Digit_t DigitVecMult(Digit_t *res, Digit_t *a, Digit_t m, int l)
- /* res[0..l-1] = a[0..l-1]*m; return CARRY; */
- Digit_t DigitVecMultAdd(Digit_t *res, Digit_t *a, Digit_t m, int l)
- /* res[0..l-1] += a[0..l-1]*m; return CARRY; */
- /* Special function for multiple digit multiplication */
- Digit_t DigitVecMultSub(Digit_t *res, Digit_t *a, Digit_t m, int l)
- /* res[0..l-1] -= a[0..l-1]*m; return CARRY; */
- /* Special function for multiple digit division */
- Digit_t DigitVecDiv(Digit_t *quot, Digit_t *a, Digit_t d, int l)
- /* quot[0..l-1] = a[0..l-1]/m;
- return a[0..l-1]%m;
- */
-
- Now how would optimized versions of these functions look like?
- I can report some of my experiences. First it makes not much difference
- to have the additions/subtractions in assembler. On a Mips processor
- there is no way of doing it better than with the C commands
- sum = a+b;
- carry = (sum<a);
- as these are just two machine instructions.
- On a Sparc at least the gcc compiler avoids branches in the above statement
- and needs three machine instructions. If you take the overhead for
- function call, loads/stores and loop control into account this is
- almost as good as possible.
-
- Something similar holds for DigitVecDiv, that should just contain
- calls to DigitDiv as these calls consume very little time compared
- to the actual division. But for DigitDiv itself it is important to
- use the best method available.
-
- If you just want to substitute the fast machine instructions for
- multiplication you should start with the DigitMult* functions, and if
- you are willing to do a bit more assembler you can go to DigitVecMult*
- now, DigitVecMultAdd being the most important. Optimizing these
- functions also gives you the chance to do instruction scheduling to
- get the best out of delay slots or supercsalar execution.
-
- Here are a few numbers for multiple precision multiplications of
- numbers with 80 decimal digits each on a SparcStation 2 with
- Digit_t == unsigned long, BitsPerDigit == 32:
-
- C: 403 (353 using karatsubas trick)
- DigitMultAdd in asm: 135
- DigitVecMultAdd in asm: 115
-
- On a Sparcstation 10-20 with 33 MHz Supersparc processor the best number
- achieved was 33. This was a speedup of about 10 compared to the C-version.
- But this is due to the usage of the Sparc version 7 multiplication
- routine in the C library on the machine I had access to (SunOS 4.1.3) :-)
-
- On Mips the corresponding numbers are:
-
- r3000 (33MHz) r4000 (50/100 MHz)
- C: 173 58
- DigitMultAdd in asm: 92 38
- DigitVecMultAdd in asm: 50 20
-
-
- I would be interested to hear any comments on the functions chosen,
- if some are missing (why?), if some are superfluous (why?),
- and mainly if they meet people's needs for portability and efficiency.
-
-
- Ralf Dentzer
- IWR
- Universitaet Heidelberg
- Im Neuenheimer Feld 368
- D-6900 Heidelberg
- Germany
-
- dentzer@kalliope.iwr.uni-heidelberg.de
-