home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!kithrup!hoptoad!pacbell.com!mips!swrinde!elroy.jpl.nasa.gov!usc!sol.ctr.columbia.edu!ira.uka.de!uka!uka!news
- From: S_JUFFA@iravcl.ira.uka.de (|S| Norbert Juffa)
- Newsgroups: comp.sys.intel
- Subject: What you always wanted to know about math coprocessors for 80x86 4/4
- Message-ID: <16tnskINNcs1@iraul1.ira.uka.de>
- Date: 19 Aug 92 15:03:48 GMT
- Organization: University of Karlsruhe (FRG) - Informatik Rechnerabt.
- Lines: 903
- NNTP-Posting-Host: irav1.ira.uka.de
- X-News-Reader: VMS NEWS 1.23
-
-
- References
-
- [1] Schnurer, G.: Zahlenknacker im Vormarsch.
- c't 1992, Heft 4, Seiten 170-186
- [2] Curnow, H.J.; Wichmann, B.A.: A synthetic benchmark.
- Computer Journal, Vol. 19, No. 1, 1976, pp. 43-49
- [3] Wichmann, B.A.: Validation code for the Whetstone benchmark.
- NPL Report DITC 107/88, National Physics Laboratory, UK,
- March 1988
- [4] Curnow, H.J.: Wither Whetstone? The Synthetic Benchmark after
- 15 Years.
- In: Aad van der Steen (ed.): Evaluating Supercomputers.
- London: Chapman and Hall 1990
- [5] Dongarra, J.J.: The Linpack Benchmark: An Explanation.
- In: Aad van der Steen (ed.): Evaluating Supercomputers.
- London: Chapman and Hall 1990
- [6] Dongarra, J.J.: Performance of Various Computers Using Standard
- Linear Equations Software.
- Report CS-89-85, Computer Science Department, University of
- Tennessee, March 11, 1992
- [7] Huth, N.: Dichtung und Wahrheit oder Datenblatt und Test.
- Design & Elektronik 1990, Heft 13, Seiten 105-110
- [8] Ungerer, B.: Sockelfolger.
- c't 1990, Heft 4, Seiten 162-163
- [9] Coonen, J.T.: Contributions to a Proposed Standard for Binary
- Floating-Point Arithmetic
- Ph.D. thesis, University of California, Berkeley, 1984
- [10] IEEE: IEEE Standard for Binary Floating-Point Arithmetic.
- SIGPLAN Notices, Vol. 22, No. 2, 1985, pp. 9-25
- [11] IEEE Standard for Binary Floating-Point Arithmetic.
- ANSI/IEEE Std 754-1985.
- New York, NY: Institute of Electrical and Electronics
- Engineers 1985
- [12] FasMath 83D87 Compatibility Report. Cyrix Corporation, Nov. 1989
- Order No. B2004
- [13] FasMath 83D87 Accuracy Report. Cyrix Corporation, July 1990
- Order No. B2002
- [14] FasMath 83D87 Benchmark Report. Cyrix Corporation, June 1990
- Order No. B2004
- [15] FasMath 83D87 User's Manual. Cyrix Corporation, June 1990
- Order No. L2001-003
- [16] Brent, R.P.: A FORTRAN multiple-precision arithmetic package.
- ACM Transactions on Mathematical Software, Vol. 4, No. 1,
- March 1978, pp. 57-70
- [17] 387DX User's Manual, Programmer's Reference. Intel Corporation,
- 1989
- Order No. 231917-002
- [18] Volder, J.E.: The CORDIC Trigonometric Computing Technique.
- IRE Transactions on Electronic Computers, Vol. EC-8, No. 5,
- September 1959, pp. 330-334
- [19] Walther, J.S.: A unified algorithm for elementary functions.
- AFIPS Conference Proceedings, Vol. 38, SJCC 1971, pp. 379-385
- [20] Esser, R.; Kremer, F.; Schmidt, W.G.: Testrechnungen auf der
- IBM 3090E mit Vektoreinrichtung.
- Arbeitsbericht RRZK-8803, Regionales Rechenzentrum an der
- Universit"at zu Kln, Februar 1988
- [21] McMahon, H.H.: The Livermore Fortran Kernels: A test of the
- numerical performance range.
- Technical Report UCRL-53745, Lawrence Livermore National
- Laboratory, USA, December 1986
- [22] Nave, R.: Implementation of Transcendental Functions on a Numerics
- Processor.
- Microprocessing and Microprogramming, Vol. 11, No. 3-4,
- March-April 1983, pp. 221-225
- [23] Yuen, A.K.: Intel's Floating-Point Processors.
- Electro/88 Conference Record, Boston, MA, USA, 10-12 May 1988,
- pp. 48/5-1 - 48/5-7
- [24] Stiller, A.; Ungerer, B.: Ausgerechnet.
- c't 1990, Heft 1, Seiten 90-92
- [25] Rosch, W.L.: Handfeste Hilfe oder Seifenblase?
- PC Professionell, Juni 1991, Seiten 214-237
- [26] Intel 80286 Hardware Reference Manual. Intel Corporation, 1987
- Order No.210760-002
- [27] AMD 80C287 80-bit CMOS Numeric Processor. Advanced Micro Devices,
- June 1989
- Order No. 11671B/0
- [28] Intel RapidCAD(tm) Engineering CoProcessor Performance Brief.
- Intel Corporation, 1992
- [29] i486(tm) Microprocessor Performance Report. Intel Corporation,
- April 1990
- Order No. 240734-001
- [30] Intel486(tm) DX2 Microprocessor Performance Brief. Intel
- Corporation, March 1992
- Order No. 241254-001
- [31] Abacus 3167 Floating-Point Coprocessor Data Book. Weitek
- Corporation, July 1990
- DOC No. 9030
- [32] WTL 4167 Floating-Point Coprocessor Data Book. Weitek
- Corporation, July 1989
- DOC No. 8943
- [33] Abacus Software Designer's Guide. Weitek Corporation,
- September 1989
- DOC No. 8967
- [34] Stiller, A.: Cache & Carry.
- c't 1992, Heft 6, Seiten 118-130
- [35] Stiller, A.: Cache & Carry, Teil 2.
- c't 1992, Heft 7, Seiten 28-34
- [36] Palmer, J.F.; Morse, S.P.: Die mathematischen Grundlagen der
- Numerik-Prozessoren 8087/80287.
- Mnchen: tewi 1985
- [37] 80C187 80-bit Math Coprocessor Data Sheet. Intel Corporation,
- September 1989
- Order No. 270640-003
- [38] IIT-2C87 80-bit Numeric Co-Processor Data Sheet. IIT, May 1990
- [39] Engineering note 4x4 matrix multiply transformation. IIT, 1989
- [40] Tscheuschner, E.: 4 mal 4 auf einen Streich.
- c't 1990, Heft 3, Seiten 266-276
- [41] Goldberg, D.: Computer Arithmetic.
- In: Hennessy, J.L.; Patterson, D.A.: Computer Architecture A
- Quantitative Approach. San Mateo, CA: Morgan Kaufmann 1990
- [42] 8087 Math Coprocessor Data Sheet. Intel Corporation, October 1989,
- Order No. 205835-007
- [43] 8086/8088 User's Manual, Programmer's and Hardware Reference.
- Intel Corporation, 1989
- Order No. 240487-001
- [44] 80286 and 80287 Programmer's Reference Manual. Intel Corporation,
- 1987
- Order No. 210498-005
- [45] 80287XL/XLT CHMOS III Math Coprocessor Data Sheet. Intel
- Corporation, May 1990
- Order No. 290376-001
- [46] Cyrix FasMath(tm) 82S87 Coprocessor Data Sheet. Cyrix Coporation,
- 1991
- Document 94018-00 Rev. 1.0
- [47] IIT-3C87 80-bit Numeric Co-Processor Data Sheet. IIT, May 1990
- [48] 486(tm)SX(tm) Microprocessor/ 487(tm)SX(tm) Math CoProcessor
- Data Sheet. Intel Corporation, April 1991.
- Order No. 240950-001
- [49] Schnurer, G.: Die gro"se Verlade.
- c't 1991, Heft 7, Seiten 55-57
- [50] Schnurer, G.: Eine 4 f"ur alle.
- c't 1991, Heft 6, Seite 25
- [51] Intel486(tm)DX Microprocessor Data Book. Intel Corporation,
- June 1991
- Order No. 240440-004
- [52] i486(tm) Microprocessor Hardware Reference Manual. Intel
- Corporation, 1990
- Order No. 240552-001
- [53] i486(tm) Microprocessor Programmer's Reference Manual. Intel
- Corporation, 1990
- Order No. 240486-001
- [54] Ungerer, B.: Kalte H"ute.
- c't 1992, Heft 8, Seiten 140-144
- [55] Ungerer, B.: Hei"se Sache.
- c't 1991, Heft 4, Seiten 104-108
- [56] Rosch, W.L.: Handfeste Hilfe oder Seifenblase?
- PC Profesionell, Juni 1991, Seiten 214-237
- [57] Niederkr"uger, W.: Lebendige Vergangenheit.
- c't 1990, Heft 12, Seiten 114-116
- [58] ULSI Math*Co Advanced Math Coprocessor Technical Specification.
- ULSI System, 5/92, Rev. E
- [59] 387(tm)DX Math CoProcessor Data Sheet. Intel Corporation,
- September 1990.
- Order No. 240448-003
- [60] 387(tm) Numerics Coprocessor Extension Data Sheet. Intel
- Corporation, February 1989.
- Order No. 231920-005
- [61] Koren, I.; Zinaty, O.: Evaluating Elementary Functions in a
- Numerical Coprocessor Based on Rational Approximations.
- IEEE Transactions on Computers, Vol. C-39, No. 8, August 1990,
- pp. 1030-1037
- [62] 387(tm) SX Math CoProcessor Data Sheet. Intel Corporation,
- November 1989
- Order No. 240225-005
- [63] Frenkel, G.: Coprocessors Speed Numeric Operations.
- PC-Week, August 27, 1990
- [64] Schnurer, G.; Stiller, A.: Auto-Matt.
- c't 1991, Heft 10, Seiten 94-96
- [65] Grehan, R.: FPU Face-Off.
- Byte, November 1990, pp. 194-200
- [66] Tang, P.T.P.: Testing Computer Arithmetic by Elementary Number
- Theory. Preprint MCS-P84-0889, Mathematics and Computer Science
- Division, Argonne National Laboratory, August 1989
- [67] Ferguson, W.E.: Selecting math coprocessors.
- IEEE Spectrum, July 1991, pp. 38-41
- [68] Schnabel, J.: Viermal 387.
- Computer Pers"onlich 1991, Heft 22, Seiten 153-156
- [69] Hofmann, J.: Starke Rechenknechte.
- mc 1990, Heft 7, Seiten 64-67
- [70] Woerrlein, H.; Hinnenberg, R.: Die Lust an der Power.
- Computer Live 1991, Heft 10, Seiten 138-149
-
-
-
- Manufacturer's addresses
-
- Intel Corporation
- 3065 Bowers Avenue
- Santa Clara, CA 95051
- USA
-
- IIT Integrated Information Technology, Inc.
- 2540 Mission College Blvd.
- Santa Clara, CA 95054
- USA
-
- ULSI Systems, Inc.
- 58 Daggett Drive
- San Jose, CA 95134
- USA
-
- Chips & Technologies, Inc.
- 3050 Zanker Road
- San Jose, CA 95134
- USA
-
- Weitek Corporation
- 1060 East Arques Avenue
- Sunnyvale, CA 94086
- USA
-
- AMD Advanced Microdevices, Inc.
- 901 Thompson Place
- P.O.B. 3453
- Sunnyvale, CA 94088-3453
- USA
-
- Cyrix Corporation
- P.O.B. 850118
- Richardson, TX 75085
- USA
-
-
- Appendix A
-
-
- {$N+,E+}
- PROGRAM PCtrl;
-
- VAR B,c: EXTENDED;
- Precision, L: WORD;
-
- PROCEDURE SetPrecisionControl (Precision: WORD);
- (* This procedure sets the internal precision of the NDP. Available *)
- (* precision values: 0 - 24 bits (SINGLE) *)
- (* 1 - n.a. (mapped to single) *)
- (* 2 - 53 bits (DOUBLE) *)
- (* 3 - 64 bits (EXTENDED) *)
-
- VAR CtrlWord: WORD;
-
- BEGIN {SetPrecisionCtrl}
- IF Precision = 1 THEN
- Precision := 0;
- Precision := Precision SHL 8; { make mask for PC field in ctrl word}
- ASM
- FSTCW [CtrlWord] { store NDP control word }
- MOV AX, [CtrlWord] { load control word into CPU }
- AND AX, 0FCFFh { mask out precision control field }
- OR AX, [Precision] { set desired precision in PC field }
- MOV [CtrlWord], AX { store new control word }
- FLDCW [CtrlWord] { set new precision control in NDP }
- END;
- END; {SetPrecisionCtrl}
-
- BEGIN {main}
- FOR Precision := 1 TO 3 DO BEGIN
- B := 1.2345678901234567890;
- SetPrecisionControl (Precision);
- FOR L := 1 TO 20 DO BEGIN
- B := Sqrt (B);
- END;
- FOR L := 1 TO 20 DO BEGIN
- B := B*B;
- END;
- SetPrecisionControl (3); { full precision for printout }
- WriteLn (Precision, B:28);
- END;
- END.
-
-
- +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
- {$N+,E+}
- PROGRAM RCtrl;
-
- VAR B,c: EXTENDED;
- RoundingMode, L: WORD;
-
-
- PROCEDURE SetRoundingMode (RCMode: WORD);
- (* This procedure selects one of four available rounding modes *)
- (* 0 - Round to nearest (default) *)
- (* 1 - Round down (towards negative infinity) *)
- (* 2 - Round up (towards positive infinity) *)
- (* 3 - Chop (truncate, round towards zero) *)
-
- VAR CtrlWord: WORD;
-
- BEGIN
- RCMode := RCMode SHL 10; { make mask for RC field in control word}
- ASM
- FSTCW [CtrlWord] { store NDP control word }
- MOV AX, [CtrlWord] { load control word into CPU }
- AND AX, 0F3FFh { mask out rounding control field }
- OR AX, [RCMode] { set desired precision in RC field }
- MOV [CtrlWord], AX { store new control word }
- FLDCW [CtrlWord] { set new rounding control in NDP }
- END;
- END;
-
- BEGIN
- FOR RoundingMode := 0 TO 3 DO BEGIN
- B := 1.2345678901234567890e100;
- SetRoundingMode (RoundingMode);
- FOR L := 1 TO 51 DO BEGIN
- B := Sqrt (B);
- END;
- FOR L := 1 TO 51 DO BEGIN
- B := -B*B;
- END;
- SetRoundingMode (0); { round to nearest for printout }
- WriteLn (RoundingMode, B:28);
- END;
- END.
-
-
- +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
- {$N+,E+}
-
- PROGRAM DenormTs;
-
- VAR E: EXTENDED;
- D: DOUBLE;
- S: SINGLE;
-
- BEGIN
- WriteLn ('Testing support and printing of denormals');
- WriteLn;
- Write ('Coprocessor is: ');
- CASE Test8087 OF
- 0: WriteLn ('Emulator');
- 1: WriteLn ('8087 or compatible');
- 2: WriteLn ('80287 or compatible');
- 3: WriteLn ('80387 or compatible');
- END;
- WriteLn;
- S := 1.18e-38;
- S := S * 3.90625e-3;
- IF S = 0 THEN
- WriteLn ('SINGLE denormals not supported')
- ELSE BEGIN
- WriteLn ('SINGLE denormals supported');
- WriteLn ('SINGLE denormal prints as: ', S);
- WriteLn ('Denormal should be printed as 4.60943...E-0041');
- END;
- WriteLn;
- D := 2.24e-308;
- D := D * 3.90625e-3;
- IF D = 0 THEN
- WriteLn ('DOUBLE denormals not supported')
- ELSE BEGIN
- WriteLn ('DOUBLE denormals supported');
- WriteLn ('DOUBLE denormal prints as: ', D);
- WriteLn ('Denormal should be printed as 8.75...E-0311');
- END;
- WriteLn;
- E := 3.37e-4932;
- E := E * 3.90625e-3;
- IF E = 0 THEN
- WriteLn ('EXTENDED denormals not supported')
- ELSE BEGIN
- WriteLn ('EXTENDED denormals supported');
- WriteLn ('EXTENDED denormal prints as: ', E);
- WriteLn ('Denormal should be printed as 1.3164...E-4934');
- END;
- END.
-
-
- Appendix B
-
-
- ; FILE: APFELM4.ASM
- ; assemble with MASM /e APFELM4 or TASM /e APFELM4
-
-
- CODE SEGMENT BYTE PUBLIC 'CODE'
- ASSUME CS: CODE
-
- PAGE ,120
-
- PUBLIC APPLE87;
-
- APPLE87 PROC NEAR
- PUSH BP ; save caller's base pointer
- MOV BP, SP ; make new frame pointer
- PUSH DS ; save caller's data segment
- PUSH SI ; save register
- PUSH DI ; variables
- LDS BX, [BP+04] ; pointer to parameter record
- FINIT ; init 80x87 FSP->R0
- FILD WORD PTR [BX+02] ; maxrad FSP->R7
- FLD QWORD PTR [BX+08] ; qmax FSP->R6
- FSUB QWORD PTR [BX+16] ; qmax-qmin FSP->R6
- DEC WORD PTR [BX+04] ; ymax-1
- FIDIV WORD PTR [BX+04] ; (qmax-qmin)/(ymax-1)FSP->R6
- FSTP QWORD PTR [BX+16] ; save delta_q FSP->R7
- FLD QWORD PTR [BX+24] ; pmax FSP->R6
- FSUB QWORD PTR [BX+32] ; pmax-pmin FSP->R6
- DEC WORD PTR [BX+06] ; xmax-1
- FIDIV WORD PTR [BX+06] ; delta_p FSP->R6
- MOV AX, [BX] ; save maxiter,[BX] needed for
- MOV [BX+2], AX ; 80x87 status now
- XOR BP, BP ; y=0
- FLD QWORD PTR [BX+08] ; qmax FSP->R5
- CMP WORD PTR [BX+40], 0 ; fast mode on 8087 desired ?
- JE yloop ; no, normal mode
- FSTCW [BX] ; save NDP control word
- AND WORD PTR [BX], 0FCFFh; set PCTRL = single precision
- FLDCW [BX] ; get back NDP control word
- yloop: XOR DI, DI ; x=0
- FLD QWORD PTR [BX+32] ; pmin FSP->R4
- xloop: FLDZ ; j**2= 0 FSP->R3
- FLDZ ; 2ij = 0 FSP->R2
- FLDZ ; i**2= 0 FSP->R1
- MOV CX, [BX+2] ; maxiter
- MOV DL, 41h ; mask for C0 and C3 cond.bits
- iteration: FSUB ST, ST(2) ; i**2-j**2 FSP->R1
- FADD ST, ST(3) ; i**2-j**2+p = i FSP->R1
- FLD ST(0) ; duplicate i FSP->R0
- FMUL ST(1), ST ; i**2 FSP->R0
- FADD ST, ST(0) ; 2i FSP->R0
- FXCH ST(2) ; 2*i*j FSP->R0
- FADD ST, ST(5) ; 2*i*j+q = j FSP->R0
- FMUL ST(2), ST ; 2*i*j FSP->R0
- FMUL ST, ST(0) ; j**2 FSP->R0
- FST ST(3) ; save j**2 FSP->R0
- FADD ST, ST(1) ; i**2+j**2 FSP->R0
- FCOMP ST(7) ; i**2+j**2 > maxrad? FSP->R1
- FSTSW [BX] ; save 80x87 cond.codeFSP->R1
- TEST BYTE PTR [BX+1], DL ; test carry and zero flags
- LOOPNZ iteration ; until maxiter if not diverg.
- MOV DX, CX ; number of loops executed
- NEG CX ; carry set if CX <> 0
- ADC DX, 0 ; adjust DX if no. of loops<>0
-
- ; plot point here (DI = X, BP = y, DX has the color)
-
- FSTP ST(0) ; pop i**2 FSP->R2
- FSTP ST(0) ; pop 2ij FSP->R3
- FSTP ST(0) ; pop j**2 FSP->R4
- FADD ST,ST(2) ; p=p+delta_p FSP->R4
- INC DI ; x:=x+1
- CMP DI, [BX+6] ; x > xmax ?
- JBE xloop ; no, continue on same line
- FSTP ST(0) ; pop p FSP->R5
- FSUB QWORD PTR [BX+16] ; q=q-delta_q FSP->R5
- INC BP ; y:=y+1
- CMP BP, [BX+4] ; y > ymax ?
- JBE yloop ; no, picture not done yet
-
- groesser: POP DI ; restore
- POP SI ; register variables
- POP DS ; restore caller's data segm.
- POP BP ; save caller's base pointer
- RET 4 ; pop parameters and return
- APPLE87 ENDP
-
- CODE ENDS
-
- END
-
- ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
- UNIT Time;
-
- INTERFACE
-
- FUNCTION Clock: LONGINT; { same as VMS; time in milliseconds }
-
-
- IMPLEMENTATION
-
- FUNCTION Clock: LONGINT; ASSEMBLER;
- ASM
- PUSH DS { save caller's data segment }
- XOR DX, DX { initialize data segment to }
- MOV DS, DX { access ticker counter }
- MOV BX, 46Ch { offset of ticker counter in segm.}
- MOV DX, 43h { timer chip control port }
- MOV AL, 4 { freeze timer 0 }
- PUSHF { save caller's int flag setting }
- STI { allow update of ticker counter }
- LES DI, DS:[BX] { read BIOS ticker counter }
- OUT DX, AL { latch timer 0 }
- LDS SI, DS:[BX] { read BIOS ticker counter }
- IN AL, 40h { read latched timer 0 lo-byte }
- MOV AH, AL { save lo-byte }
- IN AL, 40h { read latched timer 0 hi-byte }
- POPF { restore caller's int flag }
- XCHG AL, AH { correct order of hi and lo }
- MOV CX, ES { ticker counter 1 in CX:DI:AX }
- CMP DI, SI { ticker counter updated ? }
- JE @no_update { no }
- OR AX, AX { update before timer freeze ? }
- JNS @no_update { no }
- MOV DI, SI { use second }
- MOV CX, DS { ticker counter }
- @no_update:NOT AX { counter counts down }
- MOV BX, 36EDh { load multiplier }
- MUL BX { W1 * M }
- MOV SI, DX { save W1 * M (hi) }
- MOV AX, BX { get M }
- MUL DI { W2 * M }
- XCHG BX, AX { AX = M, BX = W2 * M (lo) }
- MOV DI, DX { DI = W2 * M (hi) }
- ADD BX, SI { accumulate }
- ADC DI, 0 { result }
- XOR SI, SI { load zero }
- MUL CX { W3 * M }
- ADD AX, DI { accumulate }
- ADC DX, SI { result in DX:AX:BX }
- MOV DH, DL { move result }
- MOV DL, AH { from DL:AX:BX }
- MOV AH, AL { to }
- MOV AL, BH { DX:AX:BH }
- MOV DI, DX { save result }
- MOV CX, AX { in DI:CX }
- MOV AX, 25110 { calculate correction }
- MUL DX { factor }
- SUB CX, DX { subtract correction }
- SBB DI, SI { factor }
- XCHG AX, CX { result back }
- MOV DX, DI { to DX:AX }
- POP DS { restore caller's data segment }
- END;
-
-
- BEGIN
- Port [$43] := $34; { need rate generator, not square wave}
- Port [$40] := 0; { generator as prog. by some BIOSes }
- Port [$40] := 0; { for timer 0 }
- END. { Time }
-
-
- ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
- {$A+,B-,R-,I-,V-,N+,E+}
- PROGRAM PeakFlop;
-
- USES Time;
-
- TYPE ParamRec = RECORD
- MaxIter, MaxRad, YMax, XMax: WORD;
- Qmax, Qmin, Pmax, Pmin: DOUBLE;
- FastMod: WORD;
- PlotFkt: POINTER;
- FLOPS:LONGINT;
- END;
-
- VAR Param: ParamRec;
- Start: LONGINT;
-
-
- {$L APFELM4.OBJ}
-
- PROCEDURE Apple87 (VAR Param: ParamRec); EXTERNAL;
-
-
- BEGIN
- WITH Param DO BEGIN
- MaxIter:= 50;
- MaxRad := 30;
- YMax := 30;
- XMax := 30;
- Pmin :=-2.1;
- Pmax := 1.1;
- Qmin :=-1.2;
- Qmax := 1.2;
- FastMod:= Word (FALSE);
- PlotFkt:= NIL;
- Flops := 0;
- END;
- Start := Clock;
- Apple87 (Param); { executes 104002 FLOP }
- Start := Clock - Start; { elapsed time in milliseconds }
- WriteLn ('Peak-MFLOPS: ', 104.002 / Start);
- END.
-
- ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
- ; FILE: M4X4.ASM
- ;
- ; assemble with TASM /e M4X4 or MASM /e M4X4
-
- CODE SEGMENT BYTE PUBLIC 'CODE'
-
- ASSUME CS:CODE
-
- PUBLIC MUL_4x4
- PUBLIC IIT_MUL_4x4
-
-
- FSBP0 EQU DB 0DBh, 0E8h ; declare special IIT
- FSBP1 EQU DB 0DBh, 0EBh ; instructions
- FSBP2 EQU DB 0DBh, 0EAh
- F4X4 EQU DB 0DBh, 0F1h
-
-
- ;---------------------------------------------------------------------
- ;
- ; MUL_4x4 multiplicates a four-by-four matrix by an array of four
- ; dimensional vectors. This operation is needed for 3D transformations
- ; in graphics data processing. There are arrays for each component of
- ; a vector. Thus there is an ; array containing all the x components,
- ; another containing all the y components and so on. Each component is
- ; an 8 byte IEEE floating point number. Two indices into the array of
- ; vectors are given. The first is the index of the vector that will be
- ; processed first, the second is the index of the vector processed
- ; last.
- ;
- ;---------------------------------------------------------------------
-
- MUL_4x4 PROC NEAR
-
- AddrX EQU DWORD PTR [BP+24] ; address of X component array
- AddrY EQU DWORD PTR [BP+20] ; address of Y component array
- AddrZ EQU DWORD PTR [BP+16] ; address of Z component array
- AddrW EQU DWORD PTR [BP+12] ; address of W component array
- AddrT EQU DWORD PTR [BP+8] ; addr. of 4x4 transform. mat.
- F EQU WORD PTR [BP+6] ; first vector to process
- K EQU WORD PTR [BP+4] ; last vector to process
- RetAddr EQU WORD PTR [BP+2] ; return address saved by call
- SavdBP EQU WORD PTR [BP+0] ; saved frame pointer
- SavdDS EQU WORD PTR [BP-2] ; caller's data segment
-
- PUSH BP ; save TURBO-Pascal frame pointer
- MOV BP, SP ; new frame pointer
- PUSH DS ; save TURBO-Pascal data segment
-
- MOV CX, K ; final index
- SUB CX, F ; final index - start index
- JNC $ok ; must not
- JMP $nothing ; be negative
- $ok: INC CX ; number of elements
-
- MOV SI, F ; init offset into arrays
- SHL SI, 1 ; each
- SHL SI, 1 ; element
- SHL SI, 1 ; has 8 bytes
-
- LDS DI, AddrT ; addr. of transformation mat.
- FLD QWORD PTR [DI] ; load a[0,0] = R7
- FLD QWORD PTR [DI+8] ; load a[0,1] = R6
-
- $mat_mul: LES BX, AddrX ; addr. of x component array
- FLD QWORD PTR ES:[BX+SI] ; load x[a] = R5
- LES BX, AddrY ; addr. of y component array
- FLD QWORD PTR ES:[BX+SI] ; load y[a] = R4
- LES BX, AddrZ ; addr. of z component array
- FLD QWORD PTR ES:[BX+SI] ; load z[a] = R3
- LES BX, AddrW ; addr. of w component array
- FLD QWORD PTR ES:[BX+SI] ; load w[a] = R2
-
- FLD ST(5) ; load a[0,0] = R1
- FMUL ST, ST(4) ; a[0,0] * x[a] = R1
- FLD ST(5) ; load a[0,1] = R0
- FMUL ST, ST(4) ; a[0,1] * y[a] = R0
- FADDP ST(1), ST ; a[0,0]*x[a]+a[0,1]*y[a]=R1
- FLD QWORD PTR [DI+16] ; load a[0,2] = R0
- FMUL ST, ST(3) ; a[0,2] * z[a] = R0
- FADDP ST(1), ST ; a[0,0]*x[a]...a[0,2]*z[a]=R1
- FLD QWORD PTR [DI+24] ; load a[0,3] = R0
- FMUL ST, ST(2) ; a[0,3] * w[a] = R0
- FADDP ST(1), ST ; a[0,0]*x[a]...a[0,3]*w[a]=R1
- LES BX, AddrX ; get address of x vector
- FSTP QWORD PTR ES:[BX+SI] ; write new x[a]
-
- FLD QWORD PTR [DI+32] ; load a[1,0] = R1
- FMUL ST, ST(4) ; a[1,0] * x[a] = R1
- FLD QWORD PTR [DI+40] ; load a[1,1] = R0
- FMUL ST, ST(4) ; a[1,1] * y[a] = R0
- FADDP ST(1), ST ; a[1,0]*x[a]+a[1,1]*y[a]=R1
- FLD QWORD PTR [DI+48] ; load a[1,2] = R0
- FMUL ST, ST(3) ; a[1,2] * z[a] = R0
- FADDP ST(1), ST ; a[1,0]*x[a]...a[1,2]*z[a]=R1
- FLD QWORD PTR [DI+56] ; load a[1,3] = R0
- FMUL ST, ST(2) ; a[1,3] * w[a] = R0
- FADDP ST(1), ST ; a[1,0]*x[a]...a[1,3]*w[a]=R1
- LES BX, AddrY ; get address of y vector
- FSTP QWORD PTR ES:[BX+SI] ; write new y[a]
-
- FLD QWORD PTR [DI+64] ; load a[2,0] = R1
- FMUL ST, ST(4) ; a[2,0] * x[a] = R1
- FLD QWORD PTR [DI+72] ; load a[2,1] = R0
- FMUL ST, ST(4) ; a[2,1] * y[a] = R0
- FADDP ST(1), ST ; a[2,0]*x[a]+a[2,1]*y[a]=R1
- FLD QWORD PTR [DI+80] ; load a[2,2] = R0
- FMUL ST, ST(3) ; a[2,2] * z[a] = R0
- FADDP ST(1), ST ; a[2,0]*x[a]...a[2,2]*z[a]=R1
- FLD QWORD PTR [DI+88] ; load a[2,3] = R0
- FMUL ST, ST(2) ; a[2,3] * w[a] = R0
- FADDP ST(1), ST ; a[2,0]*x[a]...a[2,3]*w[a]=R1
- LES BX, AddrZ ; get address of z vector
- FSTP QWORD PTR ES:[BX+SI] ; write new z[a]
-
- FLD QWORD PTR [DI+96] ; load a[3,0] = R1
- FMULP ST(4), ST ; a[3,0] * x[a] = R5
- FLD QWORD PTR [DI+104] ; load a[3,1] = R1
- FMULP ST(3), ST ; a[3,1] * y[a] = R4
- FLD QWORD PTR [DI+112] ; load a[3,2] = R1
- FMULP ST(2), ST ; a[3,2] * z[a] = R3
- FLD QWORD PTR [DI+120] ; load a[3,3] = R1
- FMULP ST(1), ST ; a[3,3] * w[a] = R2
- FADDP ST(1), ST ; a[3,3]*w[a]+a[3,2]*z[a]=R3
- FADDP ST(1), ST ; a[3,3]*w[a]...a[3,1]*y[a]=R4
- FADDP ST(1), ST ; a[3,3]*w[a]...a[3,0]*x[a]=R5
- LES BX, AddrW ; get address of w vector
- FSTP QWORD PTR ES:[BX+SI] ; write new w[a]
-
- ADD SI, 8 ; new offset into arrays
- DEC CX ; decrement element counter
- JZ $done ; no elements left, done
- JMP $mat_mul ; transform next vector
-
- $done: FSTP ST(0) ; clear
- FSTP ST(0) ; FPU stack
- $nothing: POP DS ; restore TP data segment
- POP BP ; restore TP frame pointer
- RET 24 ; pop parameters and return
-
- MUL_4X4 ENDP
-
-
- ;---------------------------------------------------------------------
- ;
- ; IIT_MUL_4x4 multiplicates a four-by-four matrix by an array of four
- ; dimensional vectors. This operation is needed for 3D transformations
- ; in graphics data processing. There are arrays for each component of
- ; a vector. Thus there is an array containing all the x components,
- ; another containing all the y components and so on. Each component is
- ; an 8 byte IEEE floating point number. Two indices into the array of
- ; vectors are given. The first is the index of the vector that will be
- ; processed first, the second is the index of the vector processed
- ; last. This subroutine uses the special instructions only available
- ; on IIT coprocessors to provide fast matrix multiply capabilities.
- ; So make sure to use it only on IIT coprocessors.
- ;
- ;---------------------------------------------------------------------
-
- IIT_MUL_4x4 PROC NEAR
-
- AddrX EQU DWORD PTR [BP+24] ; address of X component array
- AddrY EQU DWORD PTR [BP+20] ; address of Y component array
- AddrZ EQU DWORD PTR [BP+16] ; address of Z component array
- AddrW EQU DWORD PTR [BP+12] ; address of W component array
- AddrT EQU DWORD PTR [BP+8] ; addr. of 4x4 transf. matrix
- F EQU WORD PTR [BP+6] ; first vector to process
- K EQU WORD PTR [BP+4] ; last vector to process
- RetAddr EQU WORD PTR [BP+2] ; return address saved by call
- SavdBP EQU WORD PTR [BP+0] ; saved frame pointer
- SavdDS EQU WORD PTR [BP-2] ; caller's data segment
- Ctrl87 EQU WORD PTR [BP-4] ; caller's 80x87 control word
-
- PUSH BP ; save TURBO-Pascal frame ptr
- MOV BP, SP ; new frame pointer
- PUSH DS ; save TURBO-Pascal data seg.
- SUB SP, 2 ; make local variabe
- FSTCW [Ctrl87] ; save 80x87 ctrl word
- LES SI, AddrT ; ptr to transformation matrix
- FINIT ; initialize coprocessor
- FSBP2 ; set register bank 2
- FLD QWORD PTR ES:[SI] ; load a[0,0]
- FLD QWORD PTR ES:[SI+32] ; load a[1,0]
- FLD QWORD PTR ES:[SI+64] ; load a[2,0]
- FLD QWORD PTR ES:[SI+96] ; load a[3,0]
- FLD QWORD PTR ES:[SI+8] ; load a[0,1]
- FLD QWORD PTR ES:[SI+40] ; load a[1,1]
- FLD QWORD PTR ES:[SI+72] ; load a[2,1]
- FLD QWORD PTR ES:[SI+104] ; load a[3,1]
- FINIT ; initialize coprocessor
- FSBP1 ; set register bank 1
- FLD QWORD PTR ES:[SI+16] ; load a[0,2]
- FLD QWORD PTR ES:[SI+48] ; load a[1,2]
- FLD QWORD PTR ES:[SI+80] ; load a[2,2]
- FLD QWORD PTR ES:[SI+112] ; load a[3,2]
- FLD QWORD PTR ES:[SI+24] ; load a[0,3]
- FLD QWORD PTR ES:[SI+56] ; load a[1,3]
- FLD QWORD PTR ES:[SI+88] ; load a[2,3]
- FLD QWORD PTR ES:[SI+120] ; load a[3,3]
-
- ; transformation matrix loaded
-
- MOV AX, F ; index of first vector
- MOV DX, K ; index of last vector
-
- MOV BX, AX ; index 1st vector to process
- MOV CL, 3 ; component has 8 (2**3) bytes
- SHL BX, CL ; compute offset into arrays
-
- FINIT ; initialize coprocessor
- FSBP0 ; set register bank 0
-
- $mat_loop:LES SI, AddrW ; addr. of W component array
- FLD QWORD PTR ES:[SI+BX] ; W component current vector
- LES SI, AddrZ ; addr. of Z component array
- FLD QWORD PTR ES:[SI+BX] ; Z component current vector
- LES SI, AddrY ; addr. of Y component array
- FLD QWORD PTR ES:[SI+BX] ; Y component current vector
- LES SI, AddrX ; addr. of X component array
- FLD QWORD PTR ES:[SI+BX] ; X component current vector
- F4X4 ; mul 4x4 matrix by 4x1 vector
- INC AX ; next vector
- MOV DI, AX ; next vector
- SHL DI, CL ; offset of vector into arrays
-
- FSTP QWORD PTR ES:[SI+BX] ; store X comp. of curr. vect.
- LES SI, AddrY ; address of Y component array
- FSTP QWORD PTR ES:[SI+BX] ; store Y comp. of curr. vect.
- LES SI, AddrZ ; address of Z component array
- FSTP QWORD PTR ES:[SI+BX] ; store Z comp. of curr. vect.
- LES SI, AddrW ; address of W component array
- FSTP QWORD PTR ES:[SI+BX] ; store W comp. of curr. vect.
-
- MOV BX, DI ; ofs nxt vect. in comp. arrays
- CMP AX, DX ; nxt vector past upper bound?
- JLE $mat_loop ; no, transform next vector
- FLDCW [Ctrl87] ; restore orig 80x87 ctrl word
-
- ADD SP, 2 ; get rid of local variable
- POP DS ; restore TP data segment
- POP BP ; restore TP frame pointer
- RET 24 ; pop parameters and return
- IIT_MUL_4x4 ENDP
-
- CODE ENDS
-
- END
-
- ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-
- {$N+,E+}
-
- PROGRAM Trnsform;
-
- USES Time;
-
- CONST VectorLen = 8190;
-
- TYPE Vector = ARRAY [0..VectorLen] OF DOUBLE;
- VectorPtr = ^Vector;
- Mat4 = ARRAY [1..4, 1..4] OF DOUBLE;
-
- VAR X: VectorPtr;
- Y: VectorPtr;
- Z: VectorPtr;
- W: VectorPtr;
- T: Mat4;
- K: INTEGER;
- L: INTEGER;
- First: INTEGER;
- Last: INTEGER;
- Start: LONGINT;
- Elapsed:LONGINT;
-
- PROCEDURE MUL_4X4 (X, Y, Z, W: VectorPtr;
- VAR T: Mat4; First, Last: INTEGER); EXTERNAL;
- PROCEDURE IIT_MUL_4X4 (X, Y, Z, W: VectorPtr;
- VAR T: Mat4; First, Last: INTEGER); EXTERNAL;
-
- {$L M4X4.OBJ}
-
- BEGIN
- WriteLn ('Test8087 = ', Test8087);
- New (X);
- New (Y);
- New (Z);
- New (W);
- FOR L := 1 TO VectorLen DO BEGIN
- X^ [L] := Random;
- Y^ [L] := Random;
- Z^ [L] := Random;
- W^ [L] := Random;
- END;
- X^ [0] := 1;
- Y^ [0] := 1;
- Z^ [0] := 1;
- W^ [0] := 1;
- FOR K := 1 TO 4 DO BEGIN
- FOR L := 1 TO 4 DO BEGIN
- T [K, L] := (K-1)*4 + L;
- END;
- END;
- First := 0;
- Last := 8190;
- Start := Clock;
- MUL_4X4 (X, Y, Z, W, T, First, Last);
- { IIT_MUL_4X4 (X, Y, Z, W, T, First, Last); }
- Elapsed := Clock - Start;
- WriteLn ('Number of vectors: ', Last-First+1);
- WriteLn ('Time: ', Elapsed, ' ms');
- WriteLn ('Equivalent to ', (28.0*(Last-First+1)/1e6)/
- (Elapsed*1e-3):0:4, ' MFLOPS');
- WriteLn;
- WriteLn ('Last vector:');
- WriteLn;
- WriteLn (X^[Last]);
- WriteLn (Y^[Last]);
- WriteLn (Z^[Last]);
- WriteLn (W^[Last]);
- END.
-