NetNews Usenet Archive 1992 #18

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #18 / NN_1992_18.iso / spool / comp / sys / intel / 1535 < prev next >

Wrap

Internet Message Format | 1992-08-19 | 42.7 KB

Path: sparky!uunet!kithrup!hoptoad!pacbell.com!mips!swrinde!elroy.jpl.nasa.gov!usc!sol.ctr.columbia.edu!ira.uka.de!uka!uka!news From: S_JUFFA@iravcl.ira.uka.de (|S| Norbert Juffa) Newsgroups: comp.sys.intel Subject: What you always wanted to know about math coprocessors for 80x86 4/4 Message-ID: <16tnskINNcs1@iraul1.ira.uka.de> Date: 19 Aug 92 15:03:48 GMT Organization: University of Karlsruhe (FRG) - Informatik Rechnerabt. Lines: 903 NNTP-Posting-Host: irav1.ira.uka.de X-News-Reader: VMS NEWS 1.23 References [1] Schnurer, G.: Zahlenknacker im Vormarsch. c't 1992, Heft 4, Seiten 170-186 [2] Curnow, H.J.; Wichmann, B.A.: A synthetic benchmark. Computer Journal, Vol. 19, No. 1, 1976, pp. 43-49 [3] Wichmann, B.A.: Validation code for the Whetstone benchmark. NPL Report DITC 107/88, National Physics Laboratory, UK, March 1988 [4] Curnow, H.J.: Wither Whetstone? The Synthetic Benchmark after 15 Years. In: Aad van der Steen (ed.): Evaluating Supercomputers. London: Chapman and Hall 1990 [5] Dongarra, J.J.: The Linpack Benchmark: An Explanation. In: Aad van der Steen (ed.): Evaluating Supercomputers. London: Chapman and Hall 1990 [6] Dongarra, J.J.: Performance of Various Computers Using Standard Linear Equations Software. Report CS-89-85, Computer Science Department, University of Tennessee, March 11, 1992 [7] Huth, N.: Dichtung und Wahrheit oder Datenblatt und Test. Design & Elektronik 1990, Heft 13, Seiten 105-110 [8] Ungerer, B.: Sockelfolger. c't 1990, Heft 4, Seiten 162-163 [9] Coonen, J.T.: Contributions to a Proposed Standard for Binary Floating-Point Arithmetic Ph.D. thesis, University of California, Berkeley, 1984 [10] IEEE: IEEE Standard for Binary Floating-Point Arithmetic. SIGPLAN Notices, Vol. 22, No. 2, 1985, pp. 9-25 [11] IEEE Standard for Binary Floating-Point Arithmetic. ANSI/IEEE Std 754-1985. New York, NY: Institute of Electrical and Electronics Engineers 1985 [12] FasMath 83D87 Compatibility Report. Cyrix Corporation, Nov. 1989 Order No. B2004 [13] FasMath 83D87 Accuracy Report. Cyrix Corporation, July 1990 Order No. B2002 [14] FasMath 83D87 Benchmark Report. Cyrix Corporation, June 1990 Order No. B2004 [15] FasMath 83D87 User's Manual. Cyrix Corporation, June 1990 Order No. L2001-003 [16] Brent, R.P.: A FORTRAN multiple-precision arithmetic package. ACM Transactions on Mathematical Software, Vol. 4, No. 1, March 1978, pp. 57-70 [17] 387DX User's Manual, Programmer's Reference. Intel Corporation, 1989 Order No. 231917-002 [18] Volder, J.E.: The CORDIC Trigonometric Computing Technique. IRE Transactions on Electronic Computers, Vol. EC-8, No. 5, September 1959, pp. 330-334 [19] Walther, J.S.: A unified algorithm for elementary functions. AFIPS Conference Proceedings, Vol. 38, SJCC 1971, pp. 379-385 [20] Esser, R.; Kremer, F.; Schmidt, W.G.: Testrechnungen auf der IBM 3090E mit Vektoreinrichtung. Arbeitsbericht RRZK-8803, Regionales Rechenzentrum an der Universit"at zu Kln, Februar 1988 [21] McMahon, H.H.: The Livermore Fortran Kernels: A test of the numerical performance range. Technical Report UCRL-53745, Lawrence Livermore National Laboratory, USA, December 1986 [22] Nave, R.: Implementation of Transcendental Functions on a Numerics Processor. Microprocessing and Microprogramming, Vol. 11, No. 3-4, March-April 1983, pp. 221-225 [23] Yuen, A.K.: Intel's Floating-Point Processors. Electro/88 Conference Record, Boston, MA, USA, 10-12 May 1988, pp. 48/5-1 - 48/5-7 [24] Stiller, A.; Ungerer, B.: Ausgerechnet. c't 1990, Heft 1, Seiten 90-92 [25] Rosch, W.L.: Handfeste Hilfe oder Seifenblase? PC Professionell, Juni 1991, Seiten 214-237 [26] Intel 80286 Hardware Reference Manual. Intel Corporation, 1987 Order No.210760-002 [27] AMD 80C287 80-bit CMOS Numeric Processor. Advanced Micro Devices, June 1989 Order No. 11671B/0 [28] Intel RapidCAD(tm) Engineering CoProcessor Performance Brief. Intel Corporation, 1992 [29] i486(tm) Microprocessor Performance Report. Intel Corporation, April 1990 Order No. 240734-001 [30] Intel486(tm) DX2 Microprocessor Performance Brief. Intel Corporation, March 1992 Order No. 241254-001 [31] Abacus 3167 Floating-Point Coprocessor Data Book. Weitek Corporation, July 1990 DOC No. 9030 [32] WTL 4167 Floating-Point Coprocessor Data Book. Weitek Corporation, July 1989 DOC No. 8943 [33] Abacus Software Designer's Guide. Weitek Corporation, September 1989 DOC No. 8967 [34] Stiller, A.: Cache & Carry. c't 1992, Heft 6, Seiten 118-130 [35] Stiller, A.: Cache & Carry, Teil 2. c't 1992, Heft 7, Seiten 28-34 [36] Palmer, J.F.; Morse, S.P.: Die mathematischen Grundlagen der Numerik-Prozessoren 8087/80287. Mnchen: tewi 1985 [37] 80C187 80-bit Math Coprocessor Data Sheet. Intel Corporation, September 1989 Order No. 270640-003 [38] IIT-2C87 80-bit Numeric Co-Processor Data Sheet. IIT, May 1990 [39] Engineering note 4x4 matrix multiply transformation. IIT, 1989 [40] Tscheuschner, E.: 4 mal 4 auf einen Streich. c't 1990, Heft 3, Seiten 266-276 [41] Goldberg, D.: Computer Arithmetic. In: Hennessy, J.L.; Patterson, D.A.: Computer Architecture A Quantitative Approach. San Mateo, CA: Morgan Kaufmann 1990 [42] 8087 Math Coprocessor Data Sheet. Intel Corporation, October 1989, Order No. 205835-007 [43] 8086/8088 User's Manual, Programmer's and Hardware Reference. Intel Corporation, 1989 Order No. 240487-001 [44] 80286 and 80287 Programmer's Reference Manual. Intel Corporation, 1987 Order No. 210498-005 [45] 80287XL/XLT CHMOS III Math Coprocessor Data Sheet. Intel Corporation, May 1990 Order No. 290376-001 [46] Cyrix FasMath(tm) 82S87 Coprocessor Data Sheet. Cyrix Coporation, 1991 Document 94018-00 Rev. 1.0 [47] IIT-3C87 80-bit Numeric Co-Processor Data Sheet. IIT, May 1990 [48] 486(tm)SX(tm) Microprocessor/ 487(tm)SX(tm) Math CoProcessor Data Sheet. Intel Corporation, April 1991. Order No. 240950-001 [49] Schnurer, G.: Die gro"se Verlade. c't 1991, Heft 7, Seiten 55-57 [50] Schnurer, G.: Eine 4 f"ur alle. c't 1991, Heft 6, Seite 25 [51] Intel486(tm)DX Microprocessor Data Book. Intel Corporation, June 1991 Order No. 240440-004 [52] i486(tm) Microprocessor Hardware Reference Manual. Intel Corporation, 1990 Order No. 240552-001 [53] i486(tm) Microprocessor Programmer's Reference Manual. Intel Corporation, 1990 Order No. 240486-001 [54] Ungerer, B.: Kalte H"ute. c't 1992, Heft 8, Seiten 140-144 [55] Ungerer, B.: Hei"se Sache. c't 1991, Heft 4, Seiten 104-108 [56] Rosch, W.L.: Handfeste Hilfe oder Seifenblase? PC Profesionell, Juni 1991, Seiten 214-237 [57] Niederkr"uger, W.: Lebendige Vergangenheit. c't 1990, Heft 12, Seiten 114-116 [58] ULSI Math*Co Advanced Math Coprocessor Technical Specification. ULSI System, 5/92, Rev. E [59] 387(tm)DX Math CoProcessor Data Sheet. Intel Corporation, September 1990. Order No. 240448-003 [60] 387(tm) Numerics Coprocessor Extension Data Sheet. Intel Corporation, February 1989. Order No. 231920-005 [61] Koren, I.; Zinaty, O.: Evaluating Elementary Functions in a Numerical Coprocessor Based on Rational Approximations. IEEE Transactions on Computers, Vol. C-39, No. 8, August 1990, pp. 1030-1037 [62] 387(tm) SX Math CoProcessor Data Sheet. Intel Corporation, November 1989 Order No. 240225-005 [63] Frenkel, G.: Coprocessors Speed Numeric Operations. PC-Week, August 27, 1990 [64] Schnurer, G.; Stiller, A.: Auto-Matt. c't 1991, Heft 10, Seiten 94-96 [65] Grehan, R.: FPU Face-Off. Byte, November 1990, pp. 194-200 [66] Tang, P.T.P.: Testing Computer Arithmetic by Elementary Number Theory. Preprint MCS-P84-0889, Mathematics and Computer Science Division, Argonne National Laboratory, August 1989 [67] Ferguson, W.E.: Selecting math coprocessors. IEEE Spectrum, July 1991, pp. 38-41 [68] Schnabel, J.: Viermal 387. Computer Pers"onlich 1991, Heft 22, Seiten 153-156 [69] Hofmann, J.: Starke Rechenknechte. mc 1990, Heft 7, Seiten 64-67 [70] Woerrlein, H.; Hinnenberg, R.: Die Lust an der Power. Computer Live 1991, Heft 10, Seiten 138-149 Manufacturer's addresses Intel Corporation 3065 Bowers Avenue Santa Clara, CA 95051 USA IIT Integrated Information Technology, Inc. 2540 Mission College Blvd. Santa Clara, CA 95054 USA ULSI Systems, Inc. 58 Daggett Drive San Jose, CA 95134 USA Chips & Technologies, Inc. 3050 Zanker Road San Jose, CA 95134 USA Weitek Corporation 1060 East Arques Avenue Sunnyvale, CA 94086 USA AMD Advanced Microdevices, Inc. 901 Thompson Place P.O.B. 3453 Sunnyvale, CA 94088-3453 USA Cyrix Corporation P.O.B. 850118 Richardson, TX 75085 USA Appendix A {$N+,E+} PROGRAM PCtrl; VAR B,c: EXTENDED; Precision, L: WORD; PROCEDURE SetPrecisionControl (Precision: WORD); (* This procedure sets the internal precision of the NDP. Available *) (* precision values: 0 - 24 bits (SINGLE) *) (* 1 - n.a. (mapped to single) *) (* 2 - 53 bits (DOUBLE) *) (* 3 - 64 bits (EXTENDED) *) VAR CtrlWord: WORD; BEGIN {SetPrecisionCtrl} IF Precision = 1 THEN Precision := 0; Precision := Precision SHL 8; { make mask for PC field in ctrl word} ASM FSTCW [CtrlWord] { store NDP control word } MOV AX, [CtrlWord] { load control word into CPU } AND AX, 0FCFFh { mask out precision control field } OR AX, [Precision] { set desired precision in PC field } MOV [CtrlWord], AX { store new control word } FLDCW [CtrlWord] { set new precision control in NDP } END; END; {SetPrecisionCtrl} BEGIN {main} FOR Precision := 1 TO 3 DO BEGIN B := 1.2345678901234567890; SetPrecisionControl (Precision); FOR L := 1 TO 20 DO BEGIN B := Sqrt (B); END; FOR L := 1 TO 20 DO BEGIN B := B*B; END; SetPrecisionControl (3); { full precision for printout } WriteLn (Precision, B:28); END; END. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ {$N+,E+} PROGRAM RCtrl; VAR B,c: EXTENDED; RoundingMode, L: WORD; PROCEDURE SetRoundingMode (RCMode: WORD); (* This procedure selects one of four available rounding modes *) (* 0 - Round to nearest (default) *) (* 1 - Round down (towards negative infinity) *) (* 2 - Round up (towards positive infinity) *) (* 3 - Chop (truncate, round towards zero) *) VAR CtrlWord: WORD; BEGIN RCMode := RCMode SHL 10; { make mask for RC field in control word} ASM FSTCW [CtrlWord] { store NDP control word } MOV AX, [CtrlWord] { load control word into CPU } AND AX, 0F3FFh { mask out rounding control field } OR AX, [RCMode] { set desired precision in RC field } MOV [CtrlWord], AX { store new control word } FLDCW [CtrlWord] { set new rounding control in NDP } END; END; BEGIN FOR RoundingMode := 0 TO 3 DO BEGIN B := 1.2345678901234567890e100; SetRoundingMode (RoundingMode); FOR L := 1 TO 51 DO BEGIN B := Sqrt (B); END; FOR L := 1 TO 51 DO BEGIN B := -B*B; END; SetRoundingMode (0); { round to nearest for printout } WriteLn (RoundingMode, B:28); END; END. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ {$N+,E+} PROGRAM DenormTs; VAR E: EXTENDED; D: DOUBLE; S: SINGLE; BEGIN WriteLn ('Testing support and printing of denormals'); WriteLn; Write ('Coprocessor is: '); CASE Test8087 OF 0: WriteLn ('Emulator'); 1: WriteLn ('8087 or compatible'); 2: WriteLn ('80287 or compatible'); 3: WriteLn ('80387 or compatible'); END; WriteLn; S := 1.18e-38; S := S * 3.90625e-3; IF S = 0 THEN WriteLn ('SINGLE denormals not supported') ELSE BEGIN WriteLn ('SINGLE denormals supported'); WriteLn ('SINGLE denormal prints as: ', S); WriteLn ('Denormal should be printed as 4.60943...E-0041'); END; WriteLn; D := 2.24e-308; D := D * 3.90625e-3; IF D = 0 THEN WriteLn ('DOUBLE denormals not supported') ELSE BEGIN WriteLn ('DOUBLE denormals supported'); WriteLn ('DOUBLE denormal prints as: ', D); WriteLn ('Denormal should be printed as 8.75...E-0311'); END; WriteLn; E := 3.37e-4932; E := E * 3.90625e-3; IF E = 0 THEN WriteLn ('EXTENDED denormals not supported') ELSE BEGIN WriteLn ('EXTENDED denormals supported'); WriteLn ('EXTENDED denormal prints as: ', E); WriteLn ('Denormal should be printed as 1.3164...E-4934'); END; END. Appendix B ; FILE: APFELM4.ASM ; assemble with MASM /e APFELM4 or TASM /e APFELM4 CODE SEGMENT BYTE PUBLIC 'CODE' ASSUME CS: CODE PAGE ,120 PUBLIC APPLE87; APPLE87 PROC NEAR PUSH BP ; save caller's base pointer MOV BP, SP ; make new frame pointer PUSH DS ; save caller's data segment PUSH SI ; save register PUSH DI ; variables LDS BX, [BP+04] ; pointer to parameter record FINIT ; init 80x87 FSP->R0 FILD WORD PTR [BX+02] ; maxrad FSP->R7 FLD QWORD PTR [BX+08] ; qmax FSP->R6 FSUB QWORD PTR [BX+16] ; qmax-qmin FSP->R6 DEC WORD PTR [BX+04] ; ymax-1 FIDIV WORD PTR [BX+04] ; (qmax-qmin)/(ymax-1)FSP->R6 FSTP QWORD PTR [BX+16] ; save delta_q FSP->R7 FLD QWORD PTR [BX+24] ; pmax FSP->R6 FSUB QWORD PTR [BX+32] ; pmax-pmin FSP->R6 DEC WORD PTR [BX+06] ; xmax-1 FIDIV WORD PTR [BX+06] ; delta_p FSP->R6 MOV AX, [BX] ; save maxiter,[BX] needed for MOV [BX+2], AX ; 80x87 status now XOR BP, BP ; y=0 FLD QWORD PTR [BX+08] ; qmax FSP->R5 CMP WORD PTR [BX+40], 0 ; fast mode on 8087 desired ? JE yloop ; no, normal mode FSTCW [BX] ; save NDP control word AND WORD PTR [BX], 0FCFFh; set PCTRL = single precision FLDCW [BX] ; get back NDP control word yloop: XOR DI, DI ; x=0 FLD QWORD PTR [BX+32] ; pmin FSP->R4 xloop: FLDZ ; j**2= 0 FSP->R3 FLDZ ; 2ij = 0 FSP->R2 FLDZ ; i**2= 0 FSP->R1 MOV CX, [BX+2] ; maxiter MOV DL, 41h ; mask for C0 and C3 cond.bits iteration: FSUB ST, ST(2) ; i**2-j**2 FSP->R1 FADD ST, ST(3) ; i**2-j**2+p = i FSP->R1 FLD ST(0) ; duplicate i FSP->R0 FMUL ST(1), ST ; i**2 FSP->R0 FADD ST, ST(0) ; 2i FSP->R0 FXCH ST(2) ; 2*i*j FSP->R0 FADD ST, ST(5) ; 2*i*j+q = j FSP->R0 FMUL ST(2), ST ; 2*i*j FSP->R0 FMUL ST, ST(0) ; j**2 FSP->R0 FST ST(3) ; save j**2 FSP->R0 FADD ST, ST(1) ; i**2+j**2 FSP->R0 FCOMP ST(7) ; i**2+j**2 > maxrad? FSP->R1 FSTSW [BX] ; save 80x87 cond.codeFSP->R1 TEST BYTE PTR [BX+1], DL ; test carry and zero flags LOOPNZ iteration ; until maxiter if not diverg. MOV DX, CX ; number of loops executed NEG CX ; carry set if CX <> 0 ADC DX, 0 ; adjust DX if no. of loops<>0 ; plot point here (DI = X, BP = y, DX has the color) FSTP ST(0) ; pop i**2 FSP->R2 FSTP ST(0) ; pop 2ij FSP->R3 FSTP ST(0) ; pop j**2 FSP->R4 FADD ST,ST(2) ; p=p+delta_p FSP->R4 INC DI ; x:=x+1 CMP DI, [BX+6] ; x > xmax ? JBE xloop ; no, continue on same line FSTP ST(0) ; pop p FSP->R5 FSUB QWORD PTR [BX+16] ; q=q-delta_q FSP->R5 INC BP ; y:=y+1 CMP BP, [BX+4] ; y > ymax ? JBE yloop ; no, picture not done yet groesser: POP DI ; restore POP SI ; register variables POP DS ; restore caller's data segm. POP BP ; save caller's base pointer RET 4 ; pop parameters and return APPLE87 ENDP CODE ENDS END ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ UNIT Time; INTERFACE FUNCTION Clock: LONGINT; { same as VMS; time in milliseconds } IMPLEMENTATION FUNCTION Clock: LONGINT; ASSEMBLER; ASM PUSH DS { save caller's data segment } XOR DX, DX { initialize data segment to } MOV DS, DX { access ticker counter } MOV BX, 46Ch { offset of ticker counter in segm.} MOV DX, 43h { timer chip control port } MOV AL, 4 { freeze timer 0 } PUSHF { save caller's int flag setting } STI { allow update of ticker counter } LES DI, DS:[BX] { read BIOS ticker counter } OUT DX, AL { latch timer 0 } LDS SI, DS:[BX] { read BIOS ticker counter } IN AL, 40h { read latched timer 0 lo-byte } MOV AH, AL { save lo-byte } IN AL, 40h { read latched timer 0 hi-byte } POPF { restore caller's int flag } XCHG AL, AH { correct order of hi and lo } MOV CX, ES { ticker counter 1 in CX:DI:AX } CMP DI, SI { ticker counter updated ? } JE @no_update { no } OR AX, AX { update before timer freeze ? } JNS @no_update { no } MOV DI, SI { use second } MOV CX, DS { ticker counter } @no_update:NOT AX { counter counts down } MOV BX, 36EDh { load multiplier } MUL BX { W1 * M } MOV SI, DX { save W1 * M (hi) } MOV AX, BX { get M } MUL DI { W2 * M } XCHG BX, AX { AX = M, BX = W2 * M (lo) } MOV DI, DX { DI = W2 * M (hi) } ADD BX, SI { accumulate } ADC DI, 0 { result } XOR SI, SI { load zero } MUL CX { W3 * M } ADD AX, DI { accumulate } ADC DX, SI { result in DX:AX:BX } MOV DH, DL { move result } MOV DL, AH { from DL:AX:BX } MOV AH, AL { to } MOV AL, BH { DX:AX:BH } MOV DI, DX { save result } MOV CX, AX { in DI:CX } MOV AX, 25110 { calculate correction } MUL DX { factor } SUB CX, DX { subtract correction } SBB DI, SI { factor } XCHG AX, CX { result back } MOV DX, DI { to DX:AX } POP DS { restore caller's data segment } END; BEGIN Port [$43] := $34; { need rate generator, not square wave} Port [$40] := 0; { generator as prog. by some BIOSes } Port [$40] := 0; { for timer 0 } END. { Time } ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ {$A+,B-,R-,I-,V-,N+,E+} PROGRAM PeakFlop; USES Time; TYPE ParamRec = RECORD MaxIter, MaxRad, YMax, XMax: WORD; Qmax, Qmin, Pmax, Pmin: DOUBLE; FastMod: WORD; PlotFkt: POINTER; FLOPS:LONGINT; END; VAR Param: ParamRec; Start: LONGINT; {$L APFELM4.OBJ} PROCEDURE Apple87 (VAR Param: ParamRec); EXTERNAL; BEGIN WITH Param DO BEGIN MaxIter:= 50; MaxRad := 30; YMax := 30; XMax := 30; Pmin :=-2.1; Pmax := 1.1; Qmin :=-1.2; Qmax := 1.2; FastMod:= Word (FALSE); PlotFkt:= NIL; Flops := 0; END; Start := Clock; Apple87 (Param); { executes 104002 FLOP } Start := Clock - Start; { elapsed time in milliseconds } WriteLn ('Peak-MFLOPS: ', 104.002 / Start); END. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ; FILE: M4X4.ASM ; ; assemble with TASM /e M4X4 or MASM /e M4X4 CODE SEGMENT BYTE PUBLIC 'CODE' ASSUME CS:CODE PUBLIC MUL_4x4 PUBLIC IIT_MUL_4x4 FSBP0 EQU DB 0DBh, 0E8h ; declare special IIT FSBP1 EQU DB 0DBh, 0EBh ; instructions FSBP2 EQU DB 0DBh, 0EAh F4X4 EQU DB 0DBh, 0F1h ;--------------------------------------------------------------------- ; ; MUL_4x4 multiplicates a four-by-four matrix by an array of four ; dimensional vectors. This operation is needed for 3D transformations ; in graphics data processing. There are arrays for each component of ; a vector. Thus there is an ; array containing all the x components, ; another containing all the y components and so on. Each component is ; an 8 byte IEEE floating point number. Two indices into the array of ; vectors are given. The first is the index of the vector that will be ; processed first, the second is the index of the vector processed ; last. ; ;--------------------------------------------------------------------- MUL_4x4 PROC NEAR AddrX EQU DWORD PTR [BP+24] ; address of X component array AddrY EQU DWORD PTR [BP+20] ; address of Y component array AddrZ EQU DWORD PTR [BP+16] ; address of Z component array AddrW EQU DWORD PTR [BP+12] ; address of W component array AddrT EQU DWORD PTR [BP+8] ; addr. of 4x4 transform. mat. F EQU WORD PTR [BP+6] ; first vector to process K EQU WORD PTR [BP+4] ; last vector to process RetAddr EQU WORD PTR [BP+2] ; return address saved by call SavdBP EQU WORD PTR [BP+0] ; saved frame pointer SavdDS EQU WORD PTR [BP-2] ; caller's data segment PUSH BP ; save TURBO-Pascal frame pointer MOV BP, SP ; new frame pointer PUSH DS ; save TURBO-Pascal data segment MOV CX, K ; final index SUB CX, F ; final index - start index JNC $ok ; must not JMP $nothing ; be negative $ok: INC CX ; number of elements MOV SI, F ; init offset into arrays SHL SI, 1 ; each SHL SI, 1 ; element SHL SI, 1 ; has 8 bytes LDS DI, AddrT ; addr. of transformation mat. FLD QWORD PTR [DI] ; load a[0,0] = R7 FLD QWORD PTR [DI+8] ; load a[0,1] = R6 $mat_mul: LES BX, AddrX ; addr. of x component array FLD QWORD PTR ES:[BX+SI] ; load x[a] = R5 LES BX, AddrY ; addr. of y component array FLD QWORD PTR ES:[BX+SI] ; load y[a] = R4 LES BX, AddrZ ; addr. of z component array FLD QWORD PTR ES:[BX+SI] ; load z[a] = R3 LES BX, AddrW ; addr. of w component array FLD QWORD PTR ES:[BX+SI] ; load w[a] = R2 FLD ST(5) ; load a[0,0] = R1 FMUL ST, ST(4) ; a[0,0] * x[a] = R1 FLD ST(5) ; load a[0,1] = R0 FMUL ST, ST(4) ; a[0,1] * y[a] = R0 FADDP ST(1), ST ; a[0,0]*x[a]+a[0,1]*y[a]=R1 FLD QWORD PTR [DI+16] ; load a[0,2] = R0 FMUL ST, ST(3) ; a[0,2] * z[a] = R0 FADDP ST(1), ST ; a[0,0]*x[a]...a[0,2]*z[a]=R1 FLD QWORD PTR [DI+24] ; load a[0,3] = R0 FMUL ST, ST(2) ; a[0,3] * w[a] = R0 FADDP ST(1), ST ; a[0,0]*x[a]...a[0,3]*w[a]=R1 LES BX, AddrX ; get address of x vector FSTP QWORD PTR ES:[BX+SI] ; write new x[a] FLD QWORD PTR [DI+32] ; load a[1,0] = R1 FMUL ST, ST(4) ; a[1,0] * x[a] = R1 FLD QWORD PTR [DI+40] ; load a[1,1] = R0 FMUL ST, ST(4) ; a[1,1] * y[a] = R0 FADDP ST(1), ST ; a[1,0]*x[a]+a[1,1]*y[a]=R1 FLD QWORD PTR [DI+48] ; load a[1,2] = R0 FMUL ST, ST(3) ; a[1,2] * z[a] = R0 FADDP ST(1), ST ; a[1,0]*x[a]...a[1,2]*z[a]=R1 FLD QWORD PTR [DI+56] ; load a[1,3] = R0 FMUL ST, ST(2) ; a[1,3] * w[a] = R0 FADDP ST(1), ST ; a[1,0]*x[a]...a[1,3]*w[a]=R1 LES BX, AddrY ; get address of y vector FSTP QWORD PTR ES:[BX+SI] ; write new y[a] FLD QWORD PTR [DI+64] ; load a[2,0] = R1 FMUL ST, ST(4) ; a[2,0] * x[a] = R1 FLD QWORD PTR [DI+72] ; load a[2,1] = R0 FMUL ST, ST(4) ; a[2,1] * y[a] = R0 FADDP ST(1), ST ; a[2,0]*x[a]+a[2,1]*y[a]=R1 FLD QWORD PTR [DI+80] ; load a[2,2] = R0 FMUL ST, ST(3) ; a[2,2] * z[a] = R0 FADDP ST(1), ST ; a[2,0]*x[a]...a[2,2]*z[a]=R1 FLD QWORD PTR [DI+88] ; load a[2,3] = R0 FMUL ST, ST(2) ; a[2,3] * w[a] = R0 FADDP ST(1), ST ; a[2,0]*x[a]...a[2,3]*w[a]=R1 LES BX, AddrZ ; get address of z vector FSTP QWORD PTR ES:[BX+SI] ; write new z[a] FLD QWORD PTR [DI+96] ; load a[3,0] = R1 FMULP ST(4), ST ; a[3,0] * x[a] = R5 FLD QWORD PTR [DI+104] ; load a[3,1] = R1 FMULP ST(3), ST ; a[3,1] * y[a] = R4 FLD QWORD PTR [DI+112] ; load a[3,2] = R1 FMULP ST(2), ST ; a[3,2] * z[a] = R3 FLD QWORD PTR [DI+120] ; load a[3,3] = R1 FMULP ST(1), ST ; a[3,3] * w[a] = R2 FADDP ST(1), ST ; a[3,3]*w[a]+a[3,2]*z[a]=R3 FADDP ST(1), ST ; a[3,3]*w[a]...a[3,1]*y[a]=R4 FADDP ST(1), ST ; a[3,3]*w[a]...a[3,0]*x[a]=R5 LES BX, AddrW ; get address of w vector FSTP QWORD PTR ES:[BX+SI] ; write new w[a] ADD SI, 8 ; new offset into arrays DEC CX ; decrement element counter JZ $done ; no elements left, done JMP $mat_mul ; transform next vector $done: FSTP ST(0) ; clear FSTP ST(0) ; FPU stack $nothing: POP DS ; restore TP data segment POP BP ; restore TP frame pointer RET 24 ; pop parameters and return MUL_4X4 ENDP ;--------------------------------------------------------------------- ; ; IIT_MUL_4x4 multiplicates a four-by-four matrix by an array of four ; dimensional vectors. This operation is needed for 3D transformations ; in graphics data processing. There are arrays for each component of ; a vector. Thus there is an array containing all the x components, ; another containing all the y components and so on. Each component is ; an 8 byte IEEE floating point number. Two indices into the array of ; vectors are given. The first is the index of the vector that will be ; processed first, the second is the index of the vector processed ; last. This subroutine uses the special instructions only available ; on IIT coprocessors to provide fast matrix multiply capabilities. ; So make sure to use it only on IIT coprocessors. ; ;--------------------------------------------------------------------- IIT_MUL_4x4 PROC NEAR AddrX EQU DWORD PTR [BP+24] ; address of X component array AddrY EQU DWORD PTR [BP+20] ; address of Y component array AddrZ EQU DWORD PTR [BP+16] ; address of Z component array AddrW EQU DWORD PTR [BP+12] ; address of W component array AddrT EQU DWORD PTR [BP+8] ; addr. of 4x4 transf. matrix F EQU WORD PTR [BP+6] ; first vector to process K EQU WORD PTR [BP+4] ; last vector to process RetAddr EQU WORD PTR [BP+2] ; return address saved by call SavdBP EQU WORD PTR [BP+0] ; saved frame pointer SavdDS EQU WORD PTR [BP-2] ; caller's data segment Ctrl87 EQU WORD PTR [BP-4] ; caller's 80x87 control word PUSH BP ; save TURBO-Pascal frame ptr MOV BP, SP ; new frame pointer PUSH DS ; save TURBO-Pascal data seg. SUB SP, 2 ; make local variabe FSTCW [Ctrl87] ; save 80x87 ctrl word LES SI, AddrT ; ptr to transformation matrix FINIT ; initialize coprocessor FSBP2 ; set register bank 2 FLD QWORD PTR ES:[SI] ; load a[0,0] FLD QWORD PTR ES:[SI+32] ; load a[1,0] FLD QWORD PTR ES:[SI+64] ; load a[2,0] FLD QWORD PTR ES:[SI+96] ; load a[3,0] FLD QWORD PTR ES:[SI+8] ; load a[0,1] FLD QWORD PTR ES:[SI+40] ; load a[1,1] FLD QWORD PTR ES:[SI+72] ; load a[2,1] FLD QWORD PTR ES:[SI+104] ; load a[3,1] FINIT ; initialize coprocessor FSBP1 ; set register bank 1 FLD QWORD PTR ES:[SI+16] ; load a[0,2] FLD QWORD PTR ES:[SI+48] ; load a[1,2] FLD QWORD PTR ES:[SI+80] ; load a[2,2] FLD QWORD PTR ES:[SI+112] ; load a[3,2] FLD QWORD PTR ES:[SI+24] ; load a[0,3] FLD QWORD PTR ES:[SI+56] ; load a[1,3] FLD QWORD PTR ES:[SI+88] ; load a[2,3] FLD QWORD PTR ES:[SI+120] ; load a[3,3] ; transformation matrix loaded MOV AX, F ; index of first vector MOV DX, K ; index of last vector MOV BX, AX ; index 1st vector to process MOV CL, 3 ; component has 8 (2**3) bytes SHL BX, CL ; compute offset into arrays FINIT ; initialize coprocessor FSBP0 ; set register bank 0 $mat_loop:LES SI, AddrW ; addr. of W component array FLD QWORD PTR ES:[SI+BX] ; W component current vector LES SI, AddrZ ; addr. of Z component array FLD QWORD PTR ES:[SI+BX] ; Z component current vector LES SI, AddrY ; addr. of Y component array FLD QWORD PTR ES:[SI+BX] ; Y component current vector LES SI, AddrX ; addr. of X component array FLD QWORD PTR ES:[SI+BX] ; X component current vector F4X4 ; mul 4x4 matrix by 4x1 vector INC AX ; next vector MOV DI, AX ; next vector SHL DI, CL ; offset of vector into arrays FSTP QWORD PTR ES:[SI+BX] ; store X comp. of curr. vect. LES SI, AddrY ; address of Y component array FSTP QWORD PTR ES:[SI+BX] ; store Y comp. of curr. vect. LES SI, AddrZ ; address of Z component array FSTP QWORD PTR ES:[SI+BX] ; store Z comp. of curr. vect. LES SI, AddrW ; address of W component array FSTP QWORD PTR ES:[SI+BX] ; store W comp. of curr. vect. MOV BX, DI ; ofs nxt vect. in comp. arrays CMP AX, DX ; nxt vector past upper bound? JLE $mat_loop ; no, transform next vector FLDCW [Ctrl87] ; restore orig 80x87 ctrl word ADD SP, 2 ; get rid of local variable POP DS ; restore TP data segment POP BP ; restore TP frame pointer RET 24 ; pop parameters and return IIT_MUL_4x4 ENDP CODE ENDS END ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ {$N+,E+} PROGRAM Trnsform; USES Time; CONST VectorLen = 8190; TYPE Vector = ARRAY [0..VectorLen] OF DOUBLE; VectorPtr = ^Vector; Mat4 = ARRAY [1..4, 1..4] OF DOUBLE; VAR X: VectorPtr; Y: VectorPtr; Z: VectorPtr; W: VectorPtr; T: Mat4; K: INTEGER; L: INTEGER; First: INTEGER; Last: INTEGER; Start: LONGINT; Elapsed:LONGINT; PROCEDURE MUL_4X4 (X, Y, Z, W: VectorPtr; VAR T: Mat4; First, Last: INTEGER); EXTERNAL; PROCEDURE IIT_MUL_4X4 (X, Y, Z, W: VectorPtr; VAR T: Mat4; First, Last: INTEGER); EXTERNAL; {$L M4X4.OBJ} BEGIN WriteLn ('Test8087 = ', Test8087); New (X); New (Y); New (Z); New (W); FOR L := 1 TO VectorLen DO BEGIN X^ [L] := Random; Y^ [L] := Random; Z^ [L] := Random; W^ [L] := Random; END; X^ [0] := 1; Y^ [0] := 1; Z^ [0] := 1; W^ [0] := 1; FOR K := 1 TO 4 DO BEGIN FOR L := 1 TO 4 DO BEGIN T [K, L] := (K-1)*4 + L; END; END; First := 0; Last := 8190; Start := Clock; MUL_4X4 (X, Y, Z, W, T, First, Last); { IIT_MUL_4X4 (X, Y, Z, W, T, First, Last); } Elapsed := Clock - Start; WriteLn ('Number of vectors: ', Last-First+1); WriteLn ('Time: ', Elapsed, ' ms'); WriteLn ('Equivalent to ', (28.0*(Last-First+1)/1e6)/ (Elapsed*1e-3):0:4, ' MFLOPS'); WriteLn; WriteLn ('Last vector:'); WriteLn; WriteLn (X^[Last]); WriteLn (Y^[Last]); WriteLn (Z^[Last]); WriteLn (W^[Last]); END.