NetNews Usenet Archive 1992 #27

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #27 / NN_1992_27.iso / spool / comp / arch / 10960 < prev next >

Wrap

Internet Message Format | 1992-11-19 | 7.3 KB

Xref: sparky comp.arch:10960 comp.lang.misc:3814 Newsgroups: comp.arch,comp.lang.misc Path: sparky!uunet!sun-barr!cs.utexas.edu!uwm.edu!zaphod.mps.ohio-state.edu!sol.ctr.columbia.edu!ira.uka.de!uni-heidelberg!clio!dentzer From: dentzer@clio.iwr.uni-heidelberg.de (Ralf Dentzer) Subject: Re: how to advocate new software/hardware features (Re: Hardware Support for Numeric Algorithms) Message-ID: <1992Nov20.135747.4642@sun0.urz.uni-heidelberg.de> Sender: news@sun0.urz.uni-heidelberg.de (NetNews) Organization: IWR, University of Heidelberg, Germany References: <Bxr8vG.IpI@mentor.cc.purdue.edu> <martin.721995695@bert> <martin.722176263@bert> Date: Fri, 20 Nov 92 13:57:47 GMT Lines: 161 As some sort of compromise between portability and efficiency I would make the following proposal for a set of basic functions to deal with multiple digit integers. The main restriction for them is the imposition to have a C-function interface, but only thus portability can be guaranteed. Once the C-interface and a portable C implementation is established you can feel free to optimize -- use dirty programming tricks or a compiler with special features or finally use Assembler to exploit all the wonderful features of your processor. I would highly appreciate an optimized library of these functions accompaning every C-compiler or development kit I use. As the functions are very simple and similar to each other this is an easy task for someone acquainted with the characteristics of the processor and its assembler language. First there should be a data type "Digit_t" e.g., usually unsigned long or what else fits into one register, and a constant BitsPerDigit. Then come six functions for calculating with Digits, these provide access to carrys, the higher digit of a product and a division with two digit dividend: Digit_t DigitAdd(Digit_t *sum, Digit_t a, Digit_t b, Digit_t carry); /* *sum=LOW-DIGIT(a+b+carry); return HIGH-DIGIT(a+b+carry); */ Digit_t DigitSub(Digit_t *diff, Digit_t a, Digit_t b, Digit_t carry); /* *diff=RESULT; return CARRY; where RESULT, CARRY are defined by: 2^BitsPerDigit > CARRY*2^BitsPerDigit + a - b - carry == RESULT >= 0 */ Digit_t DigitMult(Digit_t *prod, Digit_t a, Digit_t b, Digit_t carry); /* *prod=LOW-DIGIT(a*b+carry); return HIGH-DIGIT(a*b+carry); */ Digit_t DigitMultAdd(Digit_t *res, Digit_t a, Digit_t b, Digit_t carry); /* *res=LOW-DIGIT(a*b+carry+*prod); return HIGH-DIGIT(a*b+carry+*prod); */ Digit_t DigitMultSub(Digit_t *res, Digit_t a, Digit_t b, Digit_t carry); /* *res=RESULT; return RESULT; where RESULT, CARRY are defined by: 2^BitsPerDigit > CARRY*2^BitsPerDigit + *prod - a*b - carry == RESULT >= 0 */ Digit_t DigitDiv(Digit_t *quot, Digit_t h, Digit_t l, Digit_t d); /* Suppose: d>0 and h<d *quot=QUOT; return REM; where QUOT, REM are defined by: h*2^BitsPerDigit + l == d*QUOT + REM, 2^BitsPerDigit > REM >= 0 */ These functions are sufficient to build a multiple digit arithmetic, but due to function call overhead they are too slow. (More or less, depending on the processor; on a Sparc their performance is not that bad.) To get rid of this overhead we have to make an assumption about the way multiple digit integers are represented. As e.g. Geddes, Czapor and Labahn pointed out in their book "Algorithms for Computer Algebra" a representation as a vector (array) of digits is the most efficient way and should be preferred over e.g. linked lists. So we use the following functions to deal with vectors of l digits. Digit_t DigitVecAdd(Digit_t *sum, Digit_t *a, Digit_t *b, int l) /* sum[0..l-1] = a[0..l-1] + b[0..l-1]; return CARRY; */ /* This notation means: Add the integers represented by the vectors a and b, write the lowest l digits of the result into the vector sum and return the carry of this operation. */ Digit_t DigitVecSub(Digit_t *diff, Digit_t *a, Digit_t *b, int l) /* diff[0..l-1] = a[0..l-1] - b[0..l-1]; return CARRY; */ I am not sure about having DigitVecAdd or the following function DigitVecAddCarry or both. DigitVecAdd is simpler but I can imagine situations where DigitVecAddCarry is more useful. Digit_t DigitVecAddCarry(Digit_t *sum, Digit_t *a, Digit_t *b, Digit_t carry, int l) /* sum[0..l-1] = a[0..l-1] + b[0..l-1] + carry; return CARRY; */ Digit_t DigitVecSubCarry(Digit_t *diff, Digit_t *a, Digit_t *b, Digit_t carry, int l) /* diff[0..l-1] = a[0..l-1] - b[0..l-1] - carry; return CARRY; */ Digit_t DigitVecMult(Digit_t *res, Digit_t *a, Digit_t m, int l) /* res[0..l-1] = a[0..l-1]*m; return CARRY; */ Digit_t DigitVecMultAdd(Digit_t *res, Digit_t *a, Digit_t m, int l) /* res[0..l-1] += a[0..l-1]*m; return CARRY; */ /* Special function for multiple digit multiplication */ Digit_t DigitVecMultSub(Digit_t *res, Digit_t *a, Digit_t m, int l) /* res[0..l-1] -= a[0..l-1]*m; return CARRY; */ /* Special function for multiple digit division */ Digit_t DigitVecDiv(Digit_t *quot, Digit_t *a, Digit_t d, int l) /* quot[0..l-1] = a[0..l-1]/m; return a[0..l-1]%m; */ Now how would optimized versions of these functions look like? I can report some of my experiences. First it makes not much difference to have the additions/subtractions in assembler. On a Mips processor there is no way of doing it better than with the C commands sum = a+b; carry = (sum<a); as these are just two machine instructions. On a Sparc at least the gcc compiler avoids branches in the above statement and needs three machine instructions. If you take the overhead for function call, loads/stores and loop control into account this is almost as good as possible. Something similar holds for DigitVecDiv, that should just contain calls to DigitDiv as these calls consume very little time compared to the actual division. But for DigitDiv itself it is important to use the best method available. If you just want to substitute the fast machine instructions for multiplication you should start with the DigitMult* functions, and if you are willing to do a bit more assembler you can go to DigitVecMult* now, DigitVecMultAdd being the most important. Optimizing these functions also gives you the chance to do instruction scheduling to get the best out of delay slots or supercsalar execution. Here are a few numbers for multiple precision multiplications of numbers with 80 decimal digits each on a SparcStation 2 with Digit_t == unsigned long, BitsPerDigit == 32: C: 403 (353 using karatsubas trick) DigitMultAdd in asm: 135 DigitVecMultAdd in asm: 115 On a Sparcstation 10-20 with 33 MHz Supersparc processor the best number achieved was 33. This was a speedup of about 10 compared to the C-version. But this is due to the usage of the Sparc version 7 multiplication routine in the C library on the machine I had access to (SunOS 4.1.3) :-) On Mips the corresponding numbers are: r3000 (33MHz) r4000 (50/100 MHz) C: 173 58 DigitMultAdd in asm: 92 38 DigitVecMultAdd in asm: 50 20 I would be interested to hear any comments on the functions chosen, if some are missing (why?), if some are superfluous (why?), and mainly if they meet people's needs for portability and efficiency. Ralf Dentzer IWR Universitaet Heidelberg Im Neuenheimer Feld 368 D-6900 Heidelberg Germany dentzer@kalliope.iwr.uni-heidelberg.de