NetNews Usenet Archive 1992 #19

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #19 / NN_1992_19.iso / spool / comp / lang / pascal / 5169 < prev next >

Wrap

Text File | 1992-09-02 | 97.6 KB | 2,272 lines

Path: sparky!uunet!cs.utexas.edu!qt.cs.utexas.edu!yale.edu!ira.uka.de!uka!uka!news From: S_JUFFA@iravcl.ira.uka.de (|S| Norbert Juffa) Newsgroups: comp.lang.pascal Subject: Turbo Pascal 6.0 bug list (long!) Date: 2 Sep 1992 13:41:57 GMT Organization: University of Karlsruhe (FRG) - Informatik Rechnerabt. Lines: 2259 Distribution: world Message-ID: <182gb5INNsm4@iraul1.ira.uka.de> NNTP-Posting-Host: irav21.ira.uka.de X-News-Reader: VMS NEWS 1.23 y++++++++++++++++++++++ Bug List TURBO-Pascal 6.0 ++++++++++++++++++++++++++++++ This list is a compilation of all the bug reports I (Norbert Juffa, email: S_JUFFA@IRAVCL.IRA.UKA.DE) sent to Borland between 10-01-90 and 07-28-92 regarding bugs in Turbo-Pascal 6.0 that have not been fixed up till now. There were more bugs in the original release of TP 6.0, which Borland fixed in a subsequent release of TP 6.0, so these are not included in this list. For a more complete bug list of Turbo Pascal 6.0 bugs, look for the list Duncan Murdoch (dmurdoch@mast.queensu.ca) irregularly publishes on Internet. If you find any bug in TP 6.0, be it in the compiler, run-time library or Turbo Vision, please send a description of the bug to Duncan. Include a demonstration program that reliably reproduces the bug whenever possible. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1. Error in coprocessor underflow exception handler on i8087 This bug has also been present in version 5.5 of the compiler. It can lead to some really strange program behavior in pro- grams using operations on the IEEE temporary floating point format (EXTENDED) when the programm is executed on an 8087 or 80287. This error is not reproducable on an 80387, 80287XL or 80486. The bug may be demonstrated by having the following program run with an 8087/287 coprocessor: {$A+,B+,D+,E-,F-,G-,I+,L+,N+,O-,R+,S+,V+,X-} {$M 16384,0,655360} PROGRAM 87BUG; { demonstrates some strange behavior on 8087/287 } VAR X: EXTENDED; { allows storing of denormal } L: WORD; BEGIN WriteLn ('Turbo-Pascal 6.0 floating point exception bug demo program'); WriteLn; WriteLn ('Continously dividing 4e-4932 by 1.1...'); WriteLn; X := 4e-4932; { close to smallest normalized EXTENDED number } FOR L := 1 TO 5 DO BEGIN X := X / 1.1; Write (X:25); IF L > 1 THEN { after 1st iter. underflow w/ flush to zero } WriteLn (' should be: ', 0.0:25) ELSE WriteLn (' should be: ', X:25); END; END. {87BUG} The output of this program will look like this when executed on a system with an 8087/287 coprocessor: Turbo-Pascal 6.0 floating point exception bug demo program Continously dividing 4e-4932 by 1.1... 3.6363636363636364E-4932 should be: 3.6363636363636364E-4932 0.0000000000000000E+0000 should be: 0.0000000000000000E+0000 1.1000000000000000E+0000 should be: 0.0000000000000000E+0000 1.0000000000000000E+0000 should be: 0.0000000000000000E+0000 9.0909090909090909E-0001 should be: 0.0000000000000000E+0000 The bug can not be demonstrated using the coprocessor emulator, which does not exhibit this error. The problem is in the denormal exception handler for the coprocessor. Since denormalized numbers are not supported by Turbo-Pascal, whenever a denormal is loaded from memory into the coprocessor, it is changed to a true zero (so called "flush to zero" response). Denormals can only be stored to an EXTENDED type variable. When stored to DOUBLE or SINGLE type variables, the rounding provided by the coprocessor will generate zero. On loading the denormal, the coprocessor raises the denormal and underflow exceptions. Since the denormal exception is unmasked by the Turbo-Pascal start-up code, the appropriate hardware interrupt (INT 02,'NMI' on a PC type machine) is executed. The INT02 handler of Turbo-Pascal 6.0 now performs the following steps: First, it saves the coprocessor state using the FSTENV instruc- tion of the coprocessor. The saved state includes the control word, the status word, the tag word, and the instruction pointer and opcode of the instruction causing the exception. Second, it does some analysis to figure out what kind of exception was the cause of the interrupt, since all coprocessor exceptions will trap through the same interrupt. Third, after the handler is sure that an denormal triggered the interrupt, it empties the top of stack register (TOS) of the coprocessor, which contains the unwanted denormal, by executing a FSTP ST(0). After discarding the denormal, it loads a true zero with FLDZ. Finally, the handler exits through its standard exit, using FLDENV to restore the coprocessor environment. This is were things start to go wrong on an 8087/287. Although the TOS now contains zero, the associated tag code for that register still holds a 'special' tag, because the old tag word was reloaded, thus mirroring the state of the coprocessor before the coprocessor exception trap was taken. The tag code for the TOS should now contain a 'zero' tag. Register contents and coprocessor tag word are not consistent at this stage. This causes the coprocessor to ignore the next instruc- tion involving that register, which in the above example program is an FDIVP instruction. The divisor will be left on the copro- cessor stack and stored to memory instead of the quotient, thus giving 1.1 as a result in the above demonstration program. The error described should only occur on the 8087/80287, not on the 80387. From the 80387 on, Intel coprocessors examine tag codes only to distinguish empty ('11'), from nonempty ('00', '01', '10') registers. This error should be quite easy to fix. After discarding the denormal and replacing it with zero, the saved NDP state's memory image must be changed to reflect the new register contents. First, extract the value of ST from the saved status word memory image. Then generate a mask for the TOS's tag code such that subtracting it from the saved tag word memory image will decrement the tag code for the TOS from '10' (= SPECIAL) to '01' (= ZERO). The code might look like this: . FSTP ST(0) ; dispose unwanted denormal FLDZ ; load zero instead PUSH CX ; only DS, AX, and BX saved so far MOV CL, [SavedStatusWord+1]; get saved NDP status word MSB AND CL, 00111000b ; extract stack top field (0..56) SHR CL, 1 ; generate rotate SHR CL, 1 ; counter (0,2,4,..14) MOV AX, 1 ; load initial mask ROL AX, CL ; generate correct number for TOS SUB [SavedTagWord], AX ; correct TOS tag code POP CX ; don't need it any longer . . 2. Bugs in the handling of denormal numbers of types SINGLE, DOUBLE, EXTENDED (IEEE numeric data types) One of the important features of the IEEE-754 Standard for Binary Floating Point Arithmetic [1] is that it demands the implementation of denormal numbers. If the result of a computation can not be represented as a normalized number (based exponent > 0, mantissa 1 <= mantissa < 2) there is an *underflow* condition. The required response from an IEEE-754 compliant system to an underflow is the generation of a denormal number, if that is at all possible (the number could be too small to be represented even as a denormal) Denormal numbers have a based exponent of zero, the mantissa can have any value between 01....0 and 00....1. Turbo Pascal 6.0 generally supports denormal numbers for it's IEEE floating point types SINGLE, DOUBLE, and EXTENDED. However, there are some bugs and undocumented features involved on the handling of denormals. 1) Coprocessor emulator does not support denormal numbers Presumably for performance reasons, support for denormals of any IEEE data types is not included in the coprocessor emulator. If the result of an operation cannot be represented as a normalized number, the emulator will convert it to zero ("flush to zero" response). This is one of the many instances where the emulator differs from a real coprocessor and which is *not* documented in the manuals. The emulator's inability to support denormals can result in unexplainable differences in the results of the same computation depending on wether the emulator or a real coprocessor was used. There are real applications that underflow quite frequently, so the support of gradual underflow with the help of denormals can make for differences in the final results [2]. The inability of the emulator to support denormals should be flagged as a bug. 2) There is an undocumented quirk in the handling of denormalized numbers when using a program compiled with $N+ with a real coprocessor. On a 8087 and the original 80287, denormal numbers are only supported for the SINGLE and DOUBLE types, EXTENDED type denormals are flushed to zero upon being loaded. On the 80287XL, 80387, and 80486 denormals are supported for SINGLE, DOUBLE and EXTENDED. This difference is due to differences in the coprocessors and how Turbo Pascal initializes them. On a 8087/80287 loading a denormal of any data type will raise a denormalized number exception. For a 8087/287 TP has this exception unmasked. The expection response for the 8087/287 in TP is to normalize SINGLE and DOUBLE numbers in the internal (EXTENDED) format (note that the 8087/287 do not do this automatically) and to flush EXTENDED denormals to zero (probably because denormals in the internal format give rise to further complications on a 8087/287, especially since the denormal exception is unmasked, which it has to be to enable at least correct operation on SINGLE and DOUBLE denormals). On a 80287XL/387/486 loading a SINGLE or DOUBLE denormal will raise a denormalized number exception, while loading a EXTENDED denormal will *not* raise that exception [3]. For these coprocessors, TP has the denormal exception masked. The masked response to a denormal exception on these coprocessors is to automatically normalize SINGLE and DOUBLE denormals in the internal (EXTENDED) format. Since the denormal exception is masked and the handling of denormals has been enhanced for the 287XL/387/486, operations on EXTENDED denormals are safe on these coprocessors. The approach chosen for TP is reasonable. Computational results can vary depending on the coprocessor used, though. Therefore, the different handling of denormals for 8087/287 and 80287XL/ 387/486 must be documented in TP's manuals. 3) There is a bug in the float to string conversion routine in the TP 6.0 run time library that causes EXTENDED denormals to be printed as zero, even on coprocessors where EXTENDED denormals are supported by TP 6.0. This is caused by an incomplete test that results in everything with a zero exponent to be printed as zero. Unfortunately, EXTENDED denormals also have a zero exponent. The conversion routine can be enhanced to correctly print out EXTENDED denormals with very little additional code and only marginally increased timing overhead for the normalized number conversion. 4) The compiler does not allow to initialize typed constants of types SINGLE, DOUBLE, and EXTENDED with a constant representing a denormal. Without warning, zero is assigned to these constants. For example the declaration CONST Foo: SINGLE = 1e-40; Bar: DOUBLE = 1e-320; results in zero being stored to typed constants Foo and Bar. This problem is probably caused by the use of the emulator routines to compile programs with the $N+ flag. Since this behavior is not documented in the manuals and no warning is given during compilation, it is considered a bug. The following program tests the support for denormals and the correct printing of denormals: {$N+,E+} PROGRAM DenormTst; VAR E: EXTENDED; D: DOUBLE; S: SINGLE; BEGIN WriteLn ('Testing support and printing of denormals'); WriteLn; Write ('Coprocessor is: '); CASE Test8087 OF 0: WriteLn ('Emulator'); 1: WriteLn ('8087 or compatible'); 2: WriteLn ('80287 or compatible'); 3: WriteLn ('80387 or compatible'); END; WriteLn; S := 1.18e-38; S := S * 3.90625e-3; IF S = 0 THEN WriteLn ('SINGLE denormals not supported') ELSE BEGIN WriteLn ('SINGLE denormals supported'); WriteLn ('SINGLE denormal prints as: ', S); WriteLn ('Denormal should be printed as 4.60943...E-0041'); END; WriteLn; D := 2.24e-308; D := D * 3.90625e-3; IF D = 0 THEN WriteLn ('DOUBLE denormals not supported') ELSE BEGIN WriteLn ('DOUBLE denormals supported'); WriteLn ('DOUBLE denormal prints as: ', D); WriteLn ('Denormal should be printed as 8.75...E-0311'); END; WriteLn; E := 3.37e-4932; E := E * 3.90625e-3; IF E = 0 THEN WriteLn ('EXTENDED denormals not supported') ELSE BEGIN WriteLn ('EXTENDED denormals supported'); WriteLn ('EXTENDED denormal prints as: ', E); WriteLn ('Denormal should be printed as 1.3164...E-4934'); END; END. References: [1] IEEE Standard for Binary Floating-Point Arithmetic. ANSI/IEEE Std 754-1985. New York, NY: Institute of Electrical and Electronics Engineers 1985 [2] Goldberg, D.: Computer Arithmetic. In: Hennessy, J.L; Patterson, D.A: Computer Architecture - A Quantitative Approach. San Mateo, CA: Morgan Kaufmann 1990 Page A-27 [3] Intel: 387DX User's Manual. Programmers Reference. Intel 1989 3. Error in EXTENDED to string conversion (Str, Write) There is an error in the internal conversion routine Float2Str that converts an EXTENDED number to a string of decimal digits. This bug causes some NANs to be printed as INFs. The code in the routine fails to do a complete check on the mantissa to figure out if it is an INF. It checks only the sixteen most significant mantissa bits. Therefore, NANs with mantissa between 800000000001h and 8000FFFFFFFFh are printed as INF. The following program demonstrates the bug: {$N+,E+} PROGRAM INFBug; VAR X: EXTENDED; XA: ARRAY [1..5] OF WORD ABSOLUTE X; BEGIN WriteLn ('Testing correct printing of NANs'); XA [5] := $7FFF; XA [4] := $8000; XA [3] := $0000; XA [2] := $0000; XA [1] := $0001; WriteLn ('First NAN (7FFF 8000 0000 0000 0001) prints as: ', X); XA [5] := $FFFF; XA [4] := $8000; XA [3] := $0000; XA [2] := $8000; XA [1] := $0000; WriteLn ('Second NAN (FFFF 8000 0000 8000 0000) prints as: ', X); XA [5] := $7FFF; XA [4] := $8000; XA [3] := $4000; XA [2] := $0000; XA [1] := $0000; WriteLn ('Third NAN (7FFF 8000 4000 0000 0000) prints as: ', X); END. The following is an excerpt from the Float2Str routine showing the faulty code that causes the bug: CMP AX,7FFFH ; INF or NAN ? (check exponent) JNE @@10 ; no, Normal CMP Value.w6,8000H ; INF ? <----- incomplete check ! JE @@3 ; yes, print INF MOV AX,'AN' ; no, print NAN STOSW MOV AL,'N' STOSB This fragment should be replaced by the following code which eliminates the bug: CMP AX,7FFFH ; NAN or INF ? (check exponent) JNE @@10 ; no, Normal MOV DX, Value.w0 ; if any of OR DX, Value.w2 ; last 48 mantissa bits OR DX, Value.w4 ; is not zero, JNZ @3a ; must be NAN CMP Value.w6, 8000H ; INF ? JE @@3 ; yes, print INF @3a: MOV AX,'AN' ; no, print NAN STOSW MOV AL,'N' STOSB 4. Bug in string -> LONGINT conversion The VAL and READ procedures for LONGINTs do not allow the smallest LONGINT number -2147483648 (-2^31) to be read in decimal form. It can be entered in hexadecimal form $80000000 though. VAL and READ should be changed to allow all valid LONGINTs to be read, especially since this would not slow down the conversion process if done properly. 5. Bug in Random function for $N+ state The Random function in programs compiled with $N+ can return the number 1, although Random is specified to deliver values strictly smaller than 1. This error occurs since the unsigned 32-bit integer delivered by the random number generator is read into the coprocessor as a signed 32-bit integer. To avoid negative numbers, the absolute value is taken after that before the number is divided by 2^31. If however, the 32-bit integer delivered by the random number generator is 80000000h it will be converted to 2^31 by taking the absolute value in the coprocessor. Division by 2^31 will then return 1. The Random routine should be changed to read the 32-bit integers in a 64-bit format, thus avoiding the negative number problem and the FABS. 6. Differences in compile-time and run-time evaluation of certain functions It seems that at least the Round function behaves differently at compile time as compared to run time evaluation as demonstrated by the following program: {$N+,E+} PROGRAM RoundBug; CONST Y = 4.5; J = Round (Y); VAR I: INTEGER; X: EXTENDED; BEGIN X := Y; I := Round (X); WriteLn (I:5, J:5); END. One would expect I and J to be equal in the output, but actual output is: 4 5 The problem here is that the run-time version of Round uses the Coprocessor/Emulator which provides correct IEEE-754 rounding to nearest or even, while the compile time version of Round uses the REAL software arithmetic with simple round to nearest or up. The problem can easily be solved by implementing correct IEEE style rounding for REAL arithmetic, which is strongly recommended regardless of the bug given above. 7. Documentation enhancement needed with regard to Sin/Cos functions When in $N+ mode, a call to the Sin and Cos functions with an argument whose absolute value is > 2^63 = 9.22e18 will result in an error. This makes sense, since a total loss of precision will occur outside this range. However, the current version of the documentation does not document this type of error. In addition, the REAL type software arithmetic ($N-) will return zero for the sine and cosine of large arguments. It does not raise an error. When doing REAL computations in $N+ mode, an error occurs in these cases. This difference in REAL arithmetic between $N+ and $N- mode should be explained in the TP 6.0 manuals. 8. Errors in REAL type software arithmetic There is an error in the REAL-Add/Subtract routine of TP6.0 runtime library, that may cause results to be less accurate than would be possible. Before shifting the smaller operand's mantissa to the right for alignment prior to mantissa addition, a test is performed wether the shift would be for more than 39 bits. It is assumed that any shift count >= 40 would make the second operand so small that it cannot affect the result. This was true as long Turbo-Pascal truncated results in REAL arithmetic, which was the case up to and including version 5.0 of the compiler. Due to the rounding introduced with TP 5.5 the above assumption no longer holds. Because of rounding, which takes place at the 41st mantissa bit, a carry may pro- pagate to the significant 40th bit of the final result. Therefore, only when the shift counts needed for alignment is greater or equal to 41 should the mantissa addition be skipped. There are constant errors in the REAL-Exp and REAL-Ln routines that cause some uncessary inaccuracies in the function results. The constant Sqrt(2) in EXP should be coded as 81 FA 33 F3 04 35, not as 81 FB 33 F3 04 35, since the mantissa to six bytes of accuracy would be 0.3504F333F9DE. Likewise, the constant 0.5*Sqrt(2) used by LN should be 80 FA 33 F3 04 35. The REAL-Exp function returns with a runtime error 205 (overflow), when called with an argument smaller than about -88.029. However, even an argument of -88.72 would still deliver a result bigger than the smallest normalized REAL number, which is 2^-128 or 2.94e-39. Thus, Exp does not make use of the available argument range. Exp should not abort with an error when called with very small arguments anyhow. Since the exponential function approaches zero as the argument approaches negative infinity, it should simply return zero when the result is too small to be represented as a normalized real number. This behavior of Exp would be consistent with the rest of the REAL operations, since REAL arithmetic always takes the "flush to zero" approach when results underflow. 9. Error in REAL-type multiplication The multiplication routine for REAL type software arithmetic provides a faster multiplication if all but the first sixteen bits of the mantissa in one of the factors is zero. This makes multiplication much faster when a floating point number is to be multiplied by a small integer, such as in 3.1415926*10, since the converted integer does not use more than the first sixteen mantissa bits. This part of the multiplication contains a logical error. The bug will introduce a relative error of at most 3e-12 in the result, whereas all other basic arithmetical operation (with the exception of addition, see above) are accurate to the theoretical limit of the arithmetic (9.09e-13). The bug causes incorrect results in about 3% of the described type of multi- plications. The errors is caused by not using all necessary mantissa bits in the computation. The code looks as follows: . . 8BC5 MOV AX, BP ; the partial product 8AC4 MOV AL, AH ; of CH * (LSB of BP) is F6E5 MUL CH ; not taken into consideration 8BD8 MOV BX, AX ; by this computation . The bug can be eliminated by applying the following patch: 89C8 MOV AX, CX ; this code performs the correct F7E5 MUL BP ; computation. Note that the 89D3 MOV BX, DX ; correct value stored in BX 90 NOP ; may exceed the corresponding 90 NOP ; value above by up to three 10. Incorrectly restricted argument range for REAL arithmetic Round/Trunc When using REAL arithmetic ($N- mode), the Round/Trunc functions raise an error for all inputs that would cause the smallest LONGINT number -2147483648 to be returned. Thus inputs to Round are restricted to -2147483647.5 < x < 2147483647.5, although the correct available argument range should be -2147483648.5 <= x < 2147483647.5 . Inputs to Trunc are similarly incorrectly restricted to -2147483648 < x < 2147483648 where the correct range would be -2147483649 < x < 2147483648. This bug can be corrected with no increase in execution time. Correct implementation of Round/Trunc can be tested with the ROUNDTST program given below. Try a run with {$N+} and then try it with {$N-}. PROGRAM RoundTst; VAR X,Y,Z: REAL; I: LONGINT; BEGIN Y := 4.5; Z := 5.5; WriteLn ('Testing implementation of Round/Trunc for correct range', ' and IEEE-rounding'); WriteLn; WriteLn; Write ('Testing range of Round towards lower limit ... '); X := -2147483647.0; REPEAT I := Round (X); (* WriteLn (X+2147483648.0);*) X := X - 1.0/256.0; UNTIL X < -2147483648.5; WriteLn ('passed'); WriteLn; Write ('Testing range of Round towards upper limit ... '); X := 2147483647.0; REPEAT I := Round (X); (* writeln (x-2147483648.0);*) X := X + 1.0/256.0; UNTIL X >= 2147483647.5; WriteLn ('passed'); WriteLn; Write ('Testing range of Trunc towards lower limit ... '); X := -2147483647.0; REPEAT I := Trunc (X); (* writeln (x+2147483648.0);*) X := X - 1.0/256.0; UNTIL X <= -2147483649.0; WriteLn ('passed'); WriteLn; Write ('Testing range of Trunc towards upper limit ... '); X := 2147483647.0; REPEAT I := Trunc (X); (* writeln (x-2147483648.0);*) X := X + 1.0/256.0; UNTIL X >= 2147483648.0; WriteLn ('passed'); WriteLn; Write ('Round (4.5) should be: 4, actual value is: ', Round (Y)); IF Round (Y) = 4 THEN WriteLn (' passed') ELSE WriteLn (' failed'); Write ('Round (5.5) should be: 6, actual value is: ', Round (Z)); IF Round (Z) = 6 THEN WriteLn (' passed') ELSE WriteLn (' failed'); WriteLn; Y := -4.5; Z := -5.5; Write ('Round (-4.5) should be:-4, actual value is:', Round (Y)); IF Round (Y) =-4 THEN WriteLn (' passed') ELSE WriteLn (' failed'); Write ('Round (-5.5) should be:-6, actual value is:', Round (Z)); IF Round (Z) =-6 THEN WriteLn (' passed') ELSE WriteLn (' failed'); END. 11. Errors in coprocessor emulator Certain coprocessor instructions will not be correctly emulated by the emulation package of Turbo-Pascal 6.0. This bug has also been present in all previous versions of the emulator. The following program will demonstrate the faulty emulation of FDECSTP: {$A+,B-,D+,E+,F-,G-,I-,L+,N+,O-,R-,S-,V+,X-} {$M 16384,0,655360} PROGRAM EMUBUG; VAR StackPointer: BYTE; Control87, Status87: WORD; BEGIN WriteLn ('Turbo Pascal 6.0 coprocessor emulator bug demo program'); WriteLn; IF Test8087 <> 0 THEN WriteLn ('Initializing coprocessor') ELSE WriteLn ('Initializing emulator'); WriteLn ('Loading π, 1, and 0 into coprocessor / emulator'); ASM FSTCW [Control87] { save control word as set by Turbo Pascal} FINIT { initialize coprocessor / emulator } FLDPI { load π, stack pointer = 7 } FLD1 { load 1, stack pointer = 6 } FLDZ { load 0, stack pointer = 5 } FSTSW [Status87] { save status word containing stack pointer} FWAIT { wait until saved } MOV AX, [Status87] { load status word } AND AH, 38h { extract stack pointer field } SHR AH, 1 { make } SHR AH, 1 { it } SHR AH, 1 { right-aligned in byte } MOV [StackPointer], AH{ store stack pointer value } END; IF Test8087 = 0 THEN WriteLn ('emulator stack pointer now: ', StackPointer, ' should be: 5') ELSE WriteLn ('coprocessor stack pointer now: ', StackPointer, ' should be: 5'); WriteLn ('executing / emulating FDECSTP instruction'); ASM FDECSTP { decrement coprocessor/emulator stack ptr } FSTSW [Status87] { store status word containing stack ptr } FWAIT { wait until stored } MOV AH, BYTE PTR [Status87+1] { get status word MSB } AND AH, 38h { isolate stack ptr field in status word } SHR AH, 1 { make } SHR AH, 1 { it left } SHR AH, 1 { aligned in byte } MOV [StackPointer], AH{ store stack pointer value } END; IF Test8087 = 0 THEN WriteLn ('emulator stack pointer now: ', StackPointer, ' should be: 4') ELSE WriteLn ('coprocessor stack pointer now: ', StackPointer, ' should be: 4'); ASM FINIT { initalize coprocessor/emulator } FLDCW [Control87] { restore TURBO Pascal control word } END; END. When the above program is run with a coprocessor, the results will be as expected: Turbo Pascal 6.0 coprocessor emulator bug demo program Initializing coprocessor Loading π, 1, and 0 into coprocessor / emulator coprocessor stack pointer now: 5 should be: 5 executing / emulating FDECSTP instruction coprocessor stack pointer now: 4 should be: 4 However, when run with the emulator, strange things happen: Turbo Pascal 6.0 coprocessor emulator bug demo program Initializing emulator Loading π, 1, and 0 into coprocessor / emulator emulator stack pointer now: 5 should be: 5 executing / emulating FDECSTP instruction emulator stack pointer now: 6 should be: 4 It seems that at least FINCSTP and FDECSTP are incorrectly emu- lated. Tests show that FINCSTP actually decreases the stack pointer, while FDECSTP increases it. This is caused by faulty entries in the jump index table for emulated opcodes D9E0 to D9FF. Exchanging the indices for FINCSTP and FDECSTP will cause FINCSTP to function correctly, but because of another error FDECSTP will disturb the emulated registers of the 80x87 emulator, which it shouldn't do. FINCSTP and FDECSTP instructions will not be generated by the compiler. However, programs that link with modules written in assembly language or use the new ASM directive of Turbo-Pascal 6.0 might contain them. When run with the emulator, these programs will behave odd or might even crash. In addition, the wraparound stack addressing provided by the coprocessor is unavailable on the emulator. On the coprocessor, an instruction such as FADD ST(7),ST would write to register #2 if the current stacktop was three. The emulator computes a linear offset and tries to write to a non- implemented register #10. In doing so, it destroys other emulator data residing at offsets C0h to E5h in the stack segment, just above the emulator's register file (60h to BFh). This will cause the emulator to malfunction or will crash the program. The best way to get rid of this bugs would be to fix the emulator so that it correctly emulates all coprocessor instructions and the wraparound addressing. This should prove not to be too difficult, since the faulty instructions are among the easiest to emulate. When figuring out which register is meant in stack top relativ addressing, the result should be taken modulo 8 to provide correct wraparound. This should be quite easy. Additional code needed will be at a mini- mum. The other solution would be to trap all unemulated or faulty instructions (e.g. FINCSTP, FADD ST(7),ST) upon invocation of the emulator. The emulator would then emit an error message such as 'run-time error xx, unemulated coprocessor instruction' and abort the program. In this case, the documentation should provide explicit information which instructions are not emulated and should not be used if the program is to perform correctly with the emulator. Also, all other differences between coprocessor and emulator should be explained. 12. Deficiencies of coprocessor emulator The emulator of Turbo Pascal 6.0 does not emulate the following features of a physical coprocessor: precision control and rounding control. This can be proved by running the following programs with and without a coprocessor. {$N+,E+} PROGRAM PCtrlTst; VAR B: EXTENDED; Precision, L: WORD; PROCEDURE SetPrecisionControl (Precision: WORD); (* This procedure sets the internal precision of the NDP. Available *) (* precision values: 0 - 24 bits (SINGLE) *) (* 1 - n.a. (mapped to single) *) (* 2 - 53 bits (DOUBLE) *) (* 3 - 64 bits (EXTENDED) *) VAR CtrlWord: WORD; BEGIN {SetPrecisionCtrl} IF Precision = 1 THEN Precision := 0; Precision := Precision SHL 8; { make mask for PC field in ctrl word } ASM FSTCW [CtrlWord] { store NDP control word } MOV AX, [CtrlWord] { load control word into CPU } AND AX, 0FCFFh { mask out precision control field } OR AX, [Precision] { set desired precision in PC field } MOV [CtrlWord], AX { store new control word } FLDCW [CtrlWord] { set new precision control in NDP } END; END; {SetPrecisionCtrl} BEGIN {main} FOR Precision := 1 TO 3 DO BEGIN B := 1.2345678901234567890; SetPrecisionControl (Precision); FOR L := 1 TO 20 DO BEGIN B := Sqrt (B); END; FOR L := 1 TO 20 DO BEGIN B := B*B; END; SetPrecisionControl (3); { full precision for printout } WriteLn (Precision, B:28); END; END. The output of the above program looks like this when executed with a coprocessor present: 1 1.13311278820037842E+0000 (* single precision *) 2 1.23456789006442125E+0000 (* double precision *) 3 1.23456789012337585E+0000 (* extended precision *) However, when executed with the emulator, output is as follows: 1 1.23456789012351396E+0000 2 1.23456789012351396E+0000 3 1.23456789012351396E+0000 Changing the value of precision control obviously has no effect at all on the emulator. It always works with extended precision in internal calculations. This deviation of the emulator from a real coprocessor should be documented in the TP 6.0 User Manual. {$N+,E+} PROGRAM RCtrlTst; VAR B: EXTENDED; RoundingMode, L: WORD; PROCEDURE SetRoundingMode (RCMode: WORD); (* This procedure selects one of four available rounding modes *) (* 0 - Round to nearest (default) *) (* 1 - Round down (towards negative infinity) *) (* 2 - Round up (towards positive infinity) *) (* 3 - Chop (truncate, round towards zero) *) VAR CtrlWord: WORD; BEGIN RCMode := RCMode SHL 10; { make mask for RC field in control word} ASM FSTCW [CtrlWord] { store NDP control word } MOV AX, [CtrlWord] { load control word into CPU } AND AX, 0F3FFh { mask out rounding control field } OR AX, [RCMode] { set desired precision in RC field } MOV [CtrlWord], AX { store new control word } FLDCW [CtrlWord] { set new rounding control in NDP } END; END; BEGIN FOR RoundingMode := 0 TO 3 DO BEGIN B := 1.2345678901234567890e100; SetRoundingMode (RoundingMode); FOR L := 1 TO 51 DO BEGIN B := Sqrt (B); END; FOR L := 1 TO 51 DO BEGIN B := -B*B; END; SetRoundingMode (0); { round to nearest for printout } WriteLn (RoundingMode, B:28); END; END. The calculations performed in the above program were selected so that every rounding mode would lead to a distinct final value. The output when run with a coprocessor appears below. As expected, four different values are printed at the end of the program if a coprocessor is present. 0 -1.23427629010100635E+0100 (* round nearest *) 1 -1.23427623555772409E+0100 (* round down *) 2 -1.23457760966801097E+0100 (* round up *) 3 -1.23397493540770643E+0100 (* chop *) With the emulator, four identical results are produced, indicating that the emulator does not support the IEEE rounding modes of the coprocessor. 0 -1.23457766383395931E+0100 1 -1.23457766383395931E+0100 2 -1.23457766383395931E+0100 3 -1.23457766383395931E+0100 This deviation from the behavior of the actual coprocessor should be mentioned in TP 6.0 documentation. 13. Deficiencies in Coprocessor emulator The coprocessor emulator used by programs compiled in the $N+,E+ mode when a coprocessor is absent at run-time does not correctly handle special arguments like ZERO, INF, and NANs. Specific problems are: - multiplication and division with INFs resulting in NANs instead of INFs - 0+(-0) = -0, but (-0)+0 = 0 - operations on QNANs (quiet NaNs) signaling an exception The following program will demonstrate the bugs: {$N+,E+} PROGRAM InfTest; VAR INF, NEGINF: EXTENDED; QNAN, SNAN: EXTENDED; X, NEGX: EXTENDED; Z, NEGZ: EXTENDED; PSEUDOZERO: EXTENDED; INFA: ARRAY [1..5] OF WORD ABSOLUTE INF; QA: ARRAY [1..5] OF WORD ABSOLUTE QNAN; SA: ARRAY [1..5] OF WORD ABSOLUTE SNAN; PA: ARRAY [1..5] OF WORD ABSOLUTE PSEUDOZERO; BEGIN INFA [5] := $7FFF; INFA [4] := $8000; INFA [3] := $0000; INFA [2] := $0000; INFA [1] := $0000; QA [5] := $7FFF; QA [4] := $C000; QA [3] := $0000; QA [2] := $0000; QA [1] := $0001; SA [5] := $7FFF; SA [4] := $8000; SA [3] := $0000; SA [2] := $0000; SA [1] := $0001; PA [5] := $52FB; PA [4] := $0000; PA [3] := $0000; PA [2] := $0000; PA [1] := $0000; NEGINF := -INF; X := 5; NEGX := -5; Z := 0; NEGZ := -Z; WriteLn (' INF + INF: ', INF + INF:6:0, ' should be INF'); WriteLn ('-INF + -INF: ', NEGINF + NEGINF:6:0, ' should be -INF'); WriteLn (' INF + X : ', INF + X:6:0, ' should be INF'); WriteLn (' INF - -INF: ', INF - NEGINF:6:0, ' should be INF'); WriteLn ('-INF - INF: ', NEGINF - INF:6:0, ' should be -INF'); WriteLn (' X - INF: ', X - INF:6:0, ' should be -INF'); WriteLn (' INF * INF: ', INF * INF:6:0, ' should be INF'); WriteLn ('-INF * INF: ', NEGINF * INF:6:0, ' should be -INF'); WriteLn (' INF * -INF: ', INF * NEGINF:6:0, ' should be -INF'); WriteLn ('-INF * -INF: ', NEGINF * NEGINF:6:0, ' should be INF'); WriteLn (' X * INF: ', X * INF:6:0, ' should be INF'); WriteLn (' -X * INF: ', NEGX * INF:6:0, ' should be -INF'); WriteLn (' INF / 0 : ', INF / 0:6:0, ' should be INF'); WriteLn ('-INF / 0 : ', NEGINF / 0:6:0, ' should be -INF'); WriteLn (' X / INF: ', X / INF:6:0, ' should be 0'); WriteLn (' INF / -X : ', INF / NEGX:6:0, ' should be INF'); WriteLn (' Sqrt (INF): ', Sqrt (INF):6:0, ' should be INF'); WriteLn (' -0 + -0 : ', NEGZ + NEGZ:6:0, ' should be -0'); WriteLn (' 0 + -0 : ', Z + NEGZ:6:0, ' should be 0'); WriteLn (' -0 + 0 : ', NEGZ + Z:6:0, ' should be 0'); WriteLn (' -0 * 0 : ', NEGZ * Z:6:0, ' should be -0'); WriteLn (' 0 * -0 : ', Z * NEGZ:6:0, ' should be -0'); WriteLn (' -0 * X : ', NEGZ * X:6:0, ' should be -0'); WriteLn (' X * -0 : ', X * NEGZ:6:0, ' should be -0'); WriteLn (' -X * 0 : ', NEGX * Z:6:0, ' should be -0'); WriteLn (' -X * -0 : ', NEGX * NEGZ:6:0, ' should be 0'); WriteLn (' Sqrt (-0) : ', Sqrt (NEGZ):6:0, ' should be -0'); WriteLn ('QNAN * QNAN: ', QNAN * QNAN:6:0, ' should be NAN'); WriteLn ('QNAN + QNAN: ', QNAN + QNAN:6:0, ' should be NAN'); WriteLn ('QNAN / QNAN: ', QNAN / QNAN:6:0, ' should be NAN'); WriteLn ('Sqrt (QNAN): ', Sqrt (QNAN):6:0, ' should be NAN'); END. { InfTest} When run on an 80387 coprocessor, the output of the above program is as follows: INF + INF: INF should be INF -INF + -INF: -INF should be -INF INF + X : INF should be INF INF - -INF: INF should be INF -INF - INF: -INF should be -INF X - INF: -INF should be -INF INF * INF: INF should be INF -INF * INF: -INF should be -INF INF * -INF: -INF should be -INF -INF * -INF: INF should be INF X * INF: INF should be INF -X * INF: -INF should be -INF INF / 0 : INF should be INF -INF / 0 : -INF should be -INF X / INF: 0 should be 0 INF / -X : -INF should be INF Sqrt (INF): INF should be INF -0 + -0 : -0 should be -0 0 + -0 : 0 should be 0 -0 + 0 : 0 should be 0 -0 * 0 : -0 should be -0 0 * -0 : -0 should be -0 -0 * X : -0 should be -0 X * -0 : -0 should be -0 -X * 0 : -0 should be -0 -X * -0 : 0 should be 0 Sqrt (-0) : -0 should be -0 QNAN * QNAN: NAN should be NAN QNAN + QNAN: NAN should be NAN QNAN / QNAN: NAN should be NAN Sqrt (QNAN): NAN should be NAN However, when run with the emulator, the programs output looks like this: INF + INF: INF should be INF -INF + -INF: -INF should be -INF INF + X : INF should be INF INF - -INF: INF should be INF -INF - INF: -INF should be -INF X - INF: -INF should be -INF INF * INF: NAN should be INF <----- error -INF * INF: NAN should be -INF <----- error INF * -INF: NAN should be -INF <----- error -INF * -INF: NAN should be INF <----- error X * INF: NAN should be INF <----- error -X * INF: NAN should be -INF <----- error INF / 0 : NAN should be INF <----- error -INF / 0 : NAN should be -INF <----- error X / INF: 0 should be 0 INF / -X : NAN should be INF <----- error Sqrt (INF): INF should be INF -0 + -0 : -0 should be -0 0 + -0 : -0 should be 0 <----- error -0 + 0 : 0 should be 0 -0 * 0 : -0 should be -0 0 * -0 : -0 should be -0 -0 * X : -0 should be -0 X * -0 : -0 should be -0 -X * 0 : -0 should be -0 -X * -0 : 0 should be 0 Sqrt (-0) : -0 should be -0 QNAN * QNAN: Runtime error 207 at 0000:09F8. <----- error This handling of QNANs also violates the IEEE-754 standard for binary floating point arithmetic, which states in section 6.2 (Operations with NaNs): "Every operation involving one or two input NaNs, none of them signaling, shall signal no exception but, if a floating-point result is to be delivered, shall deliver as its result a quiet NaN, which should be one of the input NaNs". 14. Error in inline assembler The inline assembler incorrectly accepts a type name as a variable name for memory operands, if the type size is the same as the size of the memory operand. The following program fragment is legal in the current version of the inline assembler: ASM ... MOV AX, [WORD] MOV AL, [BYTE] LES DI, [LONGINT] ... END; The address generated for the memory operands in these cases is always 0. This bug should be fixed immediately. 15. Error in inline assembler The PTR operator of the inline assembler will incorrectly accept an expression of class register if the size of the register is the same as the type it is casted to. The following statements are legal in the current version of the inline assembler: ASM ... MOV BYTE PTR AL, 5 ADD WORD PTR AX, 5 MOV WORD PTR ES, 6 ... END; The assembler will generate memory references to memory location 0 in these cases, e.g. MOV BYTE PTR [0], 5. The PTR operator should be fixed to only accept expressions of class memory. 16. Possible problems arising out of the use of the SEG and OFFSET operators with local variables Use of the SEG operator with local variables and parameters is allowed in the inline assembler. One would expect that the SEG value of a local (auto) variable is the value of the stack segment (SS), just as the SEG value of a global (static) variable is the data segment (DS, @Data). However, the value stored by the inline assembler seems always to be 0. Applying the SEG operator to local variables should either be forbidden, or the correct value should be supplied by the assembler. The Programmer's Guide states that the OFFSET of local variables, parameters, and the @Result symbol is the offset relative to the framepointer of the entity in which they were declared. This works as expected, but there is a problem when nested procedures/ functions are used. Although the local variables of an outer procedural level are visible to inner procedures, no meaningful OFFSET value can be computed for these variables relative to the framepointer of the inner procedure. Therefore, the assembler should either forbid references to local variables declared at an outer level or generate the code necessary to address across the several levels of indirection involved. The following example illustrates the problem: FUNCTION Outer (R: WORD): WORD; { copies input R to function output } FUNCTION Inner (W: WORD): WORD; ASSEMBLER; {should copy R to func. output} ASM MOV AX, R { will not load R !!} END; BEGIN { Outer } R := Inner (R); { does *not* copy R to R !! } ASM MOV DI, OFFSET R { load offset of R relative to Outer's BP} MOV AX, [BP+DI] { load parameter R } MOV SI, OFFSET @Result { load offset of Outer's result } MOV [BP+SI], AX { store value of R into Outer's result } END; END; BEGIN WriteLn (Outer(5)); END. Instead of printing '5', as one would expect, this program prints a value like '9094' depending on the initial stack size set in the TP configuration. 17. Error when using the WITH directive with the inline assembler When accessing parts of a record using inline assembler from within a WITH block, the inline assembler doesn't correctly compute the addresses of the record's parts. It uses the offset of the record part within the record as the address. However, the base address of the record must be added to this value to get the correct address of the record part. Writing to record parts using the inline assembler from within a WITH block will destroy other data in the data segment. The following program illustrates the problem. It first initializes a record using the inline assembler without making use of a WITH block. It prints the contents of the record, then updates it using inline assembler from within a WITH block and print the record again. If the inline assembler worked correctly, two different printouts would be the result. Actually, the second record update doesn't change the record but destroys other data in the data segment. Therefore, the same data is printed out twice. {$A+,N-,I-,S-,R-,B-} PROGRAM WITHBug; VAR Student: RECORD ID: LONGINT; Name: STRING; GPA: REAL; END; BEGIN { first student: ID = 12345678, Name = JOHN, GPA = 1.0 } ASM MOV WORD PTR [Student.ID], 5678 MOV WORD PTR [Student.ID+2], 1234 MOV WORD PTR [Student.GPA], 81h MOV WORD PTR [Student.GPA+2], 0 MOV WORD PTR [Student.GPA+4], 0 MOV BYTE PTR [Student.Name], 4 MOV WORD PTR [Student.Name+1], 'OJ' MOV WORD PTR [Student.Name+3], 'NH' END; WriteLn ('Student''s ID: ', Student.ID); WriteLn (' Name: ', Student.Name); WriteLn (' GPA: ', Student.GPA:0:2); { second student: ID = 87654321, Name = JANE, GPA = 2.0 } WITH Student DO ASM MOV WORD PTR [ID], 4321 MOV WORD PTR [ID+2], 8765 MOV WORD PTR [GPA], 82h MOV WORD PTR [GPA+2], 0 MOV WORD PTR [GPA+4], 0 MOV BYTE PTR [Name], 4 MOV WORD PTR [Name+1], 'AJ' MOV WORD PTR [Name+3], 'EN' END; WriteLn ('Student''s ID: ', Student.ID); WriteLn (' Name: ', Student.Name); WriteLn (' GPA: ', Student.GPA:0:2); END. The following excerpts from the resulting code show the error: 1738:003B C70644002E16 MOV Word Ptr [0044],162E ; 1st record 1738:0041 C7064600D204 MOV Word Ptr [0046],04D2 ; initialization 1738:0047 C70648018100 MOV Word Ptr [0148],0081 ; using 1738:004D C7064A010000 MOV Word Ptr [014A],0000 ; correct 1738:0053 C7064C010000 MOV Word Ptr [014C],0000 ; addresses 1738:0059 C606480004 MOV Byte Ptr [0048],04 1738:005E C70649004A4F MOV Word Ptr [0049],4F4A 1738:0064 C7064B00484E MOV Word Ptr [004B],4E48 1738:00E4 C7060000E110 MOV Word Ptr [0000],10E1 ; 2nd record 1738:00EA C70602003D22 MOV Word Ptr [0002],223D ; initialization 1738:00F0 C70604018200 MOV Word Ptr [0104],0082 ; using offsets 1738:00F6 C70606010000 MOV Word Ptr [0106],0000 ; into record 1738:00FC C70608010000 MOV Word Ptr [0108],0000 ; instead of 1738:0102 C606040004 MOV Byte Ptr [0004],04 ; addresses 1738:0107 C70605004A41 MOV Word Ptr [0005],414A 1738:010D C70607004E45 MOV Word Ptr [0007],454E 18. Incorrect assembly of certain JMPs and CALLs by inline assembler The inline assembler will incorrectly assemble certain JMPS and CALLs that are invalid and are rejected by the MASM and TASM assemblers. It will also incorrectly assemble JMPs and CALLs to destination declared with the ABSOLUTE directive. The bugs can be demonstrated by the following program: PROGRAM Jmp_Call; VAR AbsPointer: POINTER ABSOLUTE $1234:$5678; NormPointr: POINTER; PROCEDURE FarProc; FAR; ASSEMBLER; ASM END; PROCEDURE NearProc; NEAR; ASSEMBLER; ASM END; BEGIN ASM JMP NEAR PTR AbsPointer { This is illegal in MASM / TASM } JMP FAR PTR AbsPointer { incorrectly assembled by inline assmbl.! } JMP AbsPointer { This is illegal in MASM / TASM } JMP NEAR PTR NormPointr { This is illegal in MASM / TASM } JMP FAR PTR NormPointr JMP NormPointr JMP NEAR PTR FarProc JMP FAR PTR FarProc JMP FarProc JMP NEAR PTR NearProc JMP FAR PTR NearProc JMP NearProc CALL NEAR PTR AbsPointer { This is illegal in MASM / TASM } CALL FAR PTR AbsPointer { incorrectly assembled by inline assmbl.! } CALL AbsPointer { This is illegal in MASM / TASM } CALL NEAR PTR NormPointr { This is illegal in MASM / TASM } CALL FAR PTR NormPointr CALL NormPointr CALL NEAR PTR FarProc CALL FAR PTR FarProc CALL FarProc CALL NEAR PTR NearProc CALL FAR PTR NearProc CALL NearProc END; END. The instructions marked as "illegal in MASM / TASM" should be flagged as errors by the inline assembler. "JMP NEAR PTR AbsPointer" and "JMP NEAR PTR NormPointr" are near jumps to a different CS, and in "JMP AbsPointer", AbsPointer can't be addressed with the currently assumed registers. The same remarks apply to the equivalent CALL statements in the source. Consequently, the inline assembler produces garbage for the illegal statements. "JMP FAR PTR AbsPointer" should assemble to "JMP 1234:5678", but the inline assembler produces something very different. Instead of the absolute segment 1234h it uses the value of CS and in addition mangles the offset value. The assembly language program below, which is equivalent to the above PASCAL program, shows that "JMP FAR PTR AbsPointer" and "CALL FAR PTR AbsPointer" can be assembled correctly by TASM / MASM, so the inline assembler should do this as well. DOSSEG AbsSeg SEGMENT AT 1234h ORG 5678H AbsPointer DD ? AbsSeg ENDS DATA SEGMENT WORD PUBLIC 'DATA' ASSUME DS:DATA NormPointr DD ? DATA ENDS CODE SEGMENT BYTE PUBLIC 'CODE' ASSUME CS:CODE, DS:DATA FarProc PROC FAR RET FarProc ENDP NearProc PROC NEAR RET NearProc ENDP Main: MOV AX, SEG (NormPointr) MOV DS, AX ; JMP NEAR PTR AbsPointer ; error ! JMP FAR PTR AbsPointer ; JMP AbsPointer ; error ! ; JMP NEAR PTR NormPointr ; error ! JMP FAR PTR NormPointr JMP NormPointr JMP NEAR PTR FarProc JMP FAR PTR FarProc JMP FarProc JMP NEAR PTR NearProc JMP FAR PTR NearProc JMP NearProc ; CALL NEAR PTR AbsPointer ; error ! CALL FAR PTR AbsPointer ; CALL AbsPointer ; error ! ; CALL NEAR PTR NormPointr ; error ! CALL FAR PTR NormPointr CALL NormPointr CALL NEAR PTR FarProc CALL FAR PTR FarProc CALL FarProc CALL NEAR PTR NearProc CALL FAR PTR NearProc CALL NearProc CODE ENDS STACK SEGMENT STACK DB 100h DUP (?) STACK ENDS END MAIN 19. Other bugs in the inline assembler (ASM directive) Several instructions that have a format with an immediate operand support senseless or incorrect ranges for the immediate value. The IN, OUT, and INT instructions will accept values between -128 and 255. Since negative values make no sense here, the possible range should be restricted to 0 to 255. The ENTER instruction will also take negative arguments. Again, this is not a very sensible choice. How are -5 bytes reserved for local variables? The allowed range for the arguments should be 0 to 255 and 0 to 65535, respectively. Although it is not officially documented by Intel, the AAM and AAD instructions may take additional arguments that indicate the base on which to perform the conversion. This is supported by the inline assembler. However, it accepts arguments between -128 and 127 while it should accept bases between 0 and 255, since the bases available with AAM and AAD must be positive. The inline assembler performs no check on the index in the stack top relative addressing mode of the coprocessor. Very large or even negative values are allowed. For example, FADD ST, ST(123456) will be accepted as perfectly legal. This must be fixed to make sure the index is between zero and seven. There is no way to code an absolute far jump such as JMP F000:FFF0 (to perform a warm start). The same restriction applies to far calls. In a conventional assembler such a jump could be coded as follows: BIOS SEGMENT AT 0F000h ORG 0FFF0h Restart LABEL FAR BIOS ENDS CODE SEGMENT BYTE PUBLIC 'CODE' JMP Restart CODE ENDS There are no segment declarations available with the inline assembler, so the jump has to be either hand coded with DBs or changed to a memory indirect far jump using a appropriately initialized pointer. This problem should be documented in the Turbo-Pascal manuals. One of the standard syntax available with the IMUL instruction is not accepted by the inline assembler. IMUL reg16, immed8 is not allowed, rather the inline assembler expects this to be coded as IMUL reg16, reg16, immed8, where the two registers are identical. The IMUL reg16, immed8 is commonly accepted by assemblers and is also listed in Intel's documentation. Therefore, this syntax should be supported by the inline assembler. Some 286 protected mode instructions (LLDT, LMSW, LTR, SMSW, VERR, VERW) when used with memory operands require the use of the PTR directive to establish operand size with an untyped operand (e.g., [BX+SI]). Two other instruction, SGDT and SIDT, do not require the use of PTR in these cases. This usage is inconsistent. Since the operand size can be deduced from the instructions itself (just as can be done in the case of a MOV AX, [BX]) no PTR directive at all should be required. Likewise, the POP mem16 instruction should not require a WORD PTR directive with an untyped memory operand, since memory operand size is obvious from the instruction. 20. Errors / problems / documentation deficiencies using coprocessor instructions with the new ASM directive of TP 6.0 In the $G+,N+ compiler mode the inline assembler does not assemble coprocessor instructions into emulator interrupts regardless of the $E switch setting. Instead it always generates optimized coprocessor instructions (without inserted WAITs). This causes programs compiled with $G+,N+,E+ to fail if no coprocessor is present. The assembler must ensure that the $E switch is off before performing this optimization. Coprocessor instructions in the no-wait form (e.g. FNINIT, FNSTSW, FNSTCW, FNSTENV, FNCLEX) are not encoded into emulator interrupts, since it makes no sense to use them with the emulator which cannot work in parallel with the CPU. This may lead to problems if programmers are not aware of the fact that these instructions will have absolutely no effect in an emulator environment. Since it is desirable to have the no-wait instructions available, programmers should be warned by the documentation not to use them in programs or routines that may be executed by the emulator or to explicitly code around this problem by using the system variable Test8087. An example of a work around solution follows. ASM . { some other code } . CMP Test8087, 0 { coprocessor present ? } JNE @Emulate { no, do specific code for emulator} FNINIT { can be safely used with 8087 } JMP @Continue { skip emulator code } @Emulate: FINIT { this can be emulated } @Continue: { continue with more code } . . END; 21. TP wrongly flags coprocessor instruction as 286 specific Turbo Pascal's inline assembler BASM will not allow one to assemble the coprocessor instruction FLDLN2 (load Ln(2) to TOS). During compilation it gives a compile time error 159, "286/287 instructions are not enabled". However, FLDLN2 is by no means 286 specific, it is included in the Intel docu- mentation for the 8087 (see for example "Microprocessors, Volume 1", page 2-141, Intel 1991). TP 6.0 handles FLDLN2 correctly in $G+ mode. {$N+,E-,G-} PROGRAM FLDBUG; { will not compile under TP 6.0 } BEGIN ASM FLDLN2 { TP 6.0 refuses to compile this in $G- mode } FSTP ST(0); END; END. 22. Inconsistent error messages emitted by inline assembler When constants that are out of range are supplied to assembler instructions that take some kind of immediate operand, two different error messages are emitted depending on the type of the destination operand. If the destination operand is a byte operand, as in ADD AL, 256 the compilation will result in error 155, 'Invalid combination of opcode and operands'. However, if the destination is a word operand as in ADD AX, 65536 the resulting error will be #76, 'Constant out of range'. This discrepancy should be resolved by always emitting the 'Constant out of range' error when an immediate value is not within the specified limits called for by the destination operand. Instructions that require one of their operands to be a memory reference (BOUND, LDS, LES, LEA, SGDT, SIDT, LGDT, LIDT) should cause compile error 156 (memory reference expected) to be emitted when a register is supplied instead of a memory reference. This will give a more detailed description of the error than the currently used error 155 (invalid combination of opcode and operand). There are space saving sign extending encodings available for OR, AND, and XOR instructions that the inline assembler fails to use. These encodings are the equivalents of the sign extending encodings used with the ADD, ADC, SUB, SBB, and CMP instructions. A list of the additional instructions follows: Instruction | Encoding ---------------------+------------------------------------------- OR reg16, const8 | 83 mod 001 r/m data8 OR mem16, const8 | 83 mod 001 r/m (disp) (disp) data8 AND reg16, const8 | 83 mod 100 r/m data8 AND mem16, const8 | 83 mod 100 r/m (disp) (disp) data8 XOR reg16, const8 | 83 mod 110 r/m data8 XOR mem16, const8 | 83 mod 110 r/m (disp) (disp) data8 23. Compiler Switch /V doesn't export names of SYSTEM routines When using the /V of TPC or choosing standalone debugging within the Turbo Pascal IDE, all public identifiers are supposed to be included into the EXE file for debugging purposes. However, the names of just about every routine from the SYSTEM unit are not included, although the variables (such as HeapOrg) are included. Among the few exceptions are the MemAvail and MaxAvail routines, which are sometimes included into the debug information. This bug is very annoying when programs are profiled with the Turbo Profiler and one wants to know how much time the program spends in certain SYSTEM routines. Also, when debugging programs with Turbo Debugger one would rather like the disassembly to display a call as e.g. CALL SYSTEM.LONGMUL instead of a cryptic CALL 152E:05B8. This makes the disassembled code hard to follow. I therefore urge Borland to assure correct inclusion of *all* public symbols in the debug information generated by the /V switch. 24. Problem with AAM xx and AAD xx instructions when stepping/tracing through inline ASM code with Turbo-Pascal's build-in debugger The inline assembler correctly allows parameters with the AAM and AAD instructions. Although this feature is not officially documented by Intel, it works on all 80x86 processors and compatibles, such as NEC's V30. The inline assembler will correctly assemble an instruction like AAM 16, which is quite useful when one wants to print a number in hexadecimal format. Turbo-Pascal's internal debugger does not recognize AAM opcodes other than the plain AAM opcode. When stepping/ tracing through inline assembly code, it seems to skip the instruction, causing the program to behave differently than in an ordinary run. For example, the following instruction sequence will give 0505 in AX when run on any 80x86, but will give 0055 in AX when stepped through with Turbo-Pascal's internal debugger. . . MOV AX, 0055h AAM 16 . { AX should contain 0505h now } Another example involving the AAD instruction will give 0066h in AX when simply run, but AX will contain 00CEh when the code is stepped. . . MOV AX, 0606h AAD 16 . { AX should contain 0066h now } 25. Error in heap manager (GetMem, New) Turbo-Pascal 6.0 allows memory allocation functions to allocate data structures of more than 65528 bytes on the heap. Data structures on the heap of size greater than 65528 bytes may cause segment wrap-around, thereby destroying other data on the heap or causing a general protection exception on processors from the 80286 on upwards. This general protection exception #GP(0) is triggered when a word is accessed at offset FFFFh in a segment, even when the processor is in real mode. With no valid #GP(0) handler present, the system will crash upon returning from the INT 0Dh service routine since the exception has pushed an error code *after* pushing the return address, which will not be removed from the stack without a valid #GP(0) handler present when the INT ODh executes it's IRET. 386 memory managers like QEMM or the DOS-box of windows in 386-Enhanced catch a #GP(0) exception, but plain DOS, even with MS-DOS 5.0, crashes. The following program illustrates the problems: PROGRAM HeapBug; TYPE SpcRecord = RECORD W1: WORD; W2: WORD; B1: BYTE; END; SmallArray = ARRAY [1..8] OF CHAR; BigArray = ARRAY [1..65535] OF CHAR; SpcArray = ARRAY [1..13107] OF SpcRecord; VAR P1 : ^SmallArray; P2 : ^BigArray; P3 : ^SpcArray; Hptr: POINTER; BEGIN HPtr := HeapPtr; { save initial value of heap pointer } WHILE HeapPtr = HPtr DO BEGIN New (P1); { use up blocks in freelist } END; IF Ofs (HeapPtr^) <> 8 THEN New (P1); { make sure large array will have ofs of 8 } New (P2); FillChar (P1^, 8, 'A'); { initialize 1st array } FillChar (P2^, 65534, 'B');{ initialize 2nd array -> trashes 1st array } IF P1^[6] <> 'A' THEN { chk if 1st array's integrity was violated } WriteLn ('First array trashed!'); P3 := Pointer (P2); P3^[13106].W2 := $55AA; { access at ofs FFFF causes #GP(0) on 80286 } END. The problem here is that 80x86 segments start at 16-byte boundaries (paragraph boundaries), while allocation of data structures on the heap is aligned at 8-byte boundaries. If a data structure in the heap has a start address with an offset of 8 and is greater than 65528 bytes, accessing the very last bytes of that data structure will cause undesired segment wrap around. Therefore, maximum allowed allocation for data structures on the heap should be 65528 bytes. 26. Logical error in GRAPH.TextWidth function The TextWidth function delivers uncorrect results when fonts are scaled with the SetUserCharSize procedure. To compute the width of the string passed to it, the TextWidth function adds the width of all characters in the string. Depending on the current setting of the Direction parameter within GRAPH the resulting value is then multiplied and divided by either the MultX and DivX or the MultY and DivY scaling factors. If these scaling factors are not unity, this method will compute the wrong text width. Since text justification using the OutTextXY and SetTextJustify procedures relies on the TextWidth function for computing the starting position for string output, this output is not correctly justified. The TextWidth function, when used with user supplied font scaling factors, usually returns a width that is bigger than the actual width of the string. The correct way to compute text width is to compute the actual size of every character in the string using the scale factors supplied by the user and add these values up. An example: Suppose we want to compute the width of the string 'World'. Assume that the unscaled width of the characters as taken from the font information is 10, 7, 7, 5, 7, the output direction is horizontal and that the scale factors are MultX = 5 and DivX = 8. The current implementation of TextWidth would compute the width as ((10+7+7+5+7) * 5) DIV 8 = (36 * 5) DIV 8 = 180 DIV 8 = 22. A correct implementation however, would calculate the width as follows: (10*5) DIV 8 + 3 * (7*5) DIV 8 + (5*5) DIV 8 = (6+12+3) = 21. This version is correct since it uses character sizes as used in the OutText and OutTextXY procedures. 27. Length of descender not taken into account by SetTextJustify If text is to be written at the very bottom of the current graphics window, one uses SetTextJustify (AnyMode, BottomText) and OutTextXY (AnyX, ViewMaxY, AnyText) to accomplish that. However, if the text contains letters that decend below the base line for letters, descenders are outside the window and clipped off. If one wants to output text in the manner described, this is very annoying, since the programmer has to adjust the Y-coordinate himself according to the font size in effect. The same problem occurs if text is to be written horizontally at the very right of the graphics window. Obviously, the TextHeight function used by the SetTextJustify procedure does not account for descender length. To fix the problem described, justification should be changed to account for the overall height of characters including descenders. 28. "Snow" prevention fails on CGA due to unsafe algorithm The internal DirectWrite routine of module CRT is designed to prevent "snow" when writing directly to the CGA screen. However, a logical error prevents that this snow-checking works 100% safe. The same critisism applies to the WriteView method in module VIEWS of Turbo Vision. The following is an excerpt from CRT.DirectWrite: . . @@2: LODSB ; 1; get char MOV BL,AL ; 2; @@3: IN AL,DX ; 3; wait until out of current horiz. sync,if in TEST AL,1 ; 4; JNE @@3 ; 5; CLI ; 6; @@4: IN AL,DX ; 7; wait until next horiz. sync starts TEST AL,1 ; 8; JE @@4 ; 9; MOV AX,BX ;10; STOSW ;11; write to screen STI ;12; LOOP @@2 ;13; . . If an interrupt occurs after line 3 and before line 6 in the above code fragment, the program will *not* wait for the *start* of the horizontal sync but only test if the CGA is *in* a horizontal sync upon returning from the interrupt service routine. Since horizontal sync allows only for the output of exactly one character if output starts at the very beginning of horizontal sync, there is a good chance that the above program writes to the screen after the horizontal sync has been completed, thereby causing the CGA to "snow". Of course, failure of the above code to prevent "snow" is only noticeable in a system with very high interrupt rates e.g. running serial communication as a background TSR. One additional disadvantage of the above code is that it makes only use of the horizontal sync period, although this is much shorter than the vertical retrace period. The following enhanced code is 100% safe to prevent snow and uses the vertical and horizontal retrace periods. It has been tested on an original IBM-CGA. Interrupt latency is only marginally higher than with the original code and still allows to run interrupt driven serial communication at the highest possible rate of 115000 baud. DirectWrite: CMP SI, DI ; start address = end address ? JE EmptyStr ; yes, nothing to write PUSH CX ; save PUSH DX ; registers PUSH DI ; that PUSH DS ; must be PUSH ES ; preserved MOV CX, DI ; string end address SUB CX, SI ; number of characters to write MOV DL, CheckSnow ; get flag for snow check MOV DH, TextAttr ; get current attribute XOR AX, AX ; address BIOS data area MOV DS, AX ; via segment 0 MOV AL, DS:CrtWidth+400h; width of scan line in current mode MUL BH ; multiply by cursor y-position XOR BH, BH ; clear hi-byte to prepare for addition ADD AX, BX ; add cursor x-position ADD AX, AX ; two screen bytes for every character XCHG AX, DI ; offset into screen memory to DI MOV AX, DS:Addr6845+400h; get 6845 base address ADD AX, 6 ; 6845 status port XCHG AX, DX ; AX = CheckSnow/TextAttr, DX = port MOV BX, 0B800H ; screen segment for color modes CMP DS:CrtMode+400h, 7 ; monochrome mode ? JNE ColorMode ; no, one of the color modes MOV BH, 0B0H ; screen at segment B000h if mono ColorMode:PUSH ES ; address character string POP DS ; via DS MOV ES, BX ; extra segment addresses screen seg CLD ; autoincrement for string instruct. OR AL, AL ; CheckSnow = TRUE ? (AH=attribute) JE OutLoop ; no, don't check for snow WriteChr: LODSB ; get character to write, AH = attrib XCHG AX, BX ; save character/attribute to write WaitHor: CLI ; interrupts disturb critical timing IN AL, DX ; read 6845 status TEST AL, 8 ; in vertical retrace ? JNZ WriteScr ; yes, it is safe to write to screen TEST AL, 1 ; in horizontal retrace ? JNZ WaitHor ; yes, wait until out of hor. retrace WaitHor2: IN AL, DX ; read 6845 status TEST AL, 1 ; horizontal or vertical retrace ? JZ WaitHor2 ; no, wait until either kind of retr. WriteScr: XCHG AX, BX ; in horiz. or vert. retrace: get ch STOSW ; write character and attribute STI ; interrupts ok now LOOP WriteChr ; write next character until all thru JMPS WriteDone ; screen write done OutLoop: LODSB ; get character to write STOSW ; write character and attribute LOOP OutLoop ; until all characters printed WriteDone:POP ES ; restore POP DS ; destroyed POP DI ; registers POP DX POP CX EmptyStr: RET 29. GetDir doesn't report use of invalid drive number The GetDir procedure should emit run time error 15 "Invalid drive number" when passed an invalid drive number. However, the procedure does not do the required check on the DOS return code and therefore never raises run time error 15. Instead, it always returns the String "X:\", where the X stands for any character in the IBM character set. The bug can easily be fixed by adding a few lines of code to the source module DIRH.ASM. The following program will demonstrate the bug: PROGRAM GetDirBug; VAR DriveNr: INTEGER; PathName: STRING; BEGIN REPEAT Write ('Enter Drivenumber (try also numbers > 100, 99 exits): '); ReadLn (DriveNr); GetDir (DriveNr, PathName); WriteLn('The path on drive ', DriveNr, ' is ', PathName); UNTIL DriveNr = 99; END. {GetDirBug} 30. Help bug Context sensitive help (Ctrl-F1) for the predefined arrays Port and PortW is missing. There was no help for these arrays in TP5.5 as well. 31. Problems with the file selector box in IDE The history list of a file selector box contains only those files that were selected entering the file name in the input box, not those selected by double clicking the name in the file list, which is the standard way to select a file if the mouse is heavily used. Even when working mainly with the mouse a history list is still useful, since the desired files may be at the end of a file list 100 files long and one has to get to the right part of the file list before being able to double click the file name. By the way, this is also a problem on the Apple Macintosh, since its file select boxes do not have a history list feature at all. This can really be a pain in the neck. It is therefore strongly recommended that all files that have been selected with either method (that is, by entering the name in the input box or by double clicking the name in the file list) be put in the history list. 32. Possible problems in unit APP.PAS APP.PAS contains a assembler function ISqr, that computes the integral part of the square root of its integer argument. This function has several shortcomings. First of all, it should more appropriately named ISqrt. Then, for all arguments > 32760, it will enter an endless loop. Finally it is not very fast, since it makes use of the IMUL instruction. Unfortunately, it is not clear to me, if the shortcomings pointed out cause any threat to program integrity. If it is desirable to fix the function, the following substitute could be used. It uses a more elegant and faster algorithm and returns the correct result for all positive INTEGERs. The code length is identical to the original routine ISqr. { ISqrt (I) computes INT (SQRT (I)), that is, the integral part of the } { square root of integer I. It does not check for negative arguments. } { For all arguments 0..MaxInt the correct result is returned. The } { algorithm exploits the following property: } { n } { n**2 = Sigma (2i-1) } { i=1 } FUNCTION ISqrt (I: INTEGER): INTEGER; ASSEMBLER; ASM MOV CX, I { load argument } MOV AX, -1 { init result } CWD { init odd numbers to -1 } XOR BX, BX { init perfect squares to 0 } @loop:INC AX { increment result } INC DX { compute } INC DX { next odd number } ADD BX, DX { next perfect square } CMP BX, CX { perfect square > argument ? } JBE @loop { until square greater than argument } END; 33. Poor performance of REAL type arithmetic Although this does not constitute a real bug, an analysis of the poor performance of the REAL type arithmetic will be given. The rationale here is that a 'TURBO product' should also deliver turbo performance wherever it can be achieved. One obvious example that there is ample room for speed improvements is the REAL-Sqrt function. It will take more time to compute the square root to 12 decimal places than the coprocessor emulator needs to compute the function result to 19 decimal places. I feel that such a performance is unacceptable. Unfortunately, there were no improvements in TP6.0 over TP5.5. Improvements are also possible in the LONGINT arithmetic, especially the division, which will enjoy accelerations of factor four to six (depending on the CPU) when coded using the DIV instruction. Performance can be enhanced by careful register scheduling within all routines, thus avoiding unnecessary memory accesses. This measure will also reduce the overall instruction count for a routine. Wherever possible, time saving CPU instructions such a MUL or DIV should be used. This will vastly improve performance especially on the 286, 386, and 486 CPUs. Most important is the choice of the appropriate algorithm for each function. Tests show that the REAL division uses the slowest out of four possible algorithms. This clearly indicates that not much time was invested in finding short but fast algorithms. On the other hand, the square rooting routine uses a basically fast algorithm (Newton's iteration), but obliterates it advantages by poor implementation. The trancendental functions are based on polynomial approximations. It seems that no care was taken to find the shortest and most accurate polynomials possible. The speed advantages possible by a careful recoding of the complete REAL arithmetic range from a few percent for simple functions like LONGINT to REAL conversion to up to a factor of 20 for the Sqrt function. 34. Inefficient string handling The string handling operations Insert, Delete, and Pos have always been implemented in a very simple but quite ineffient manner in Turbo-Pascal. There were no improvements in Turbo- Pascal 6.0. Since an acceleration of 300% - 400% can be achieved, this is hard to accept. *** Note: The above mentioned improvements have been realized in a replacement for the original SYSTEM.TPU. The source has been made available to BORLAND, but will not be given here. The library replacement (not the source though) is available as TPL60N15.ZIP via anonymous FTP from garbo@uwasa.fi ++++++++++++++++++++ Suggestions for enhancements ++++++++++++++++++++++++++ 1. Suggested improvements for coprocessor / emulator arithmetic The routine that patches the emulator interrupts (INT 34 to INT 3D) back to coprocessor instructions at runtime if a coprocessor is present always insert WAITs (9Bh) before the coprocessor instruction. However, for all coprocessors except the 8087 these WAITs are unnecessary, since the 287 and 387 synchronize with the CPU at hardware level, using ports F0h thru FFh. These WAITs can therefore be replaced by NOPs, resulting in somewhat faster code. Performance improvements of up to 6% were observed with programs that make heavy use of simple coprocessor instructions (linear equation solver) by this simple change. A new routine, which does insert NOPs instead of WAITs where approriate is presented here. CODE SEGMENT BYTE PUBLIC 'CODE' ASSUME CS:CODE JMPS EQU <JMP SHORT> ;------------------------------------------------------------ ; PATCH87 is the routine responsible for converting emulator ; interrupts back to coprocessor opcodes if a coprocessor is ; detected by the startup code. ; ; This routine is 1 byte shorter than the original one and has ; been enhanced to generate NOPs instead of WAITs before each ; coprocessor instruction when the coprocessor is a 287 or 387. ; ; INPUT: No input or output. The desired sideeffect is ; OUTPUT: patching the code at run-time. ; ; DESTROYS: - ; ; All rights reserved (c) 1988, 1989, 1990, 1991 Norbert Juffa ; ; Borland is free to use this code if desired ! ;------------------------------------------------------------- PATCH87 PROC FAR PUSH BP ; save TURBO-Pascal framepointer MOV BP, SP ; make new framepointer PUSH AX ; save PUSH SI ; destroyed PUSH DS ; registers TEST BYTE PTR [BP+7], 2; interrupts allowed before int ? JZ $intdis ; no STI ; yes, enable interrupts $intdis:LDS SI, [BP+2] ; load return address DEC SI ; point to int data MOV AX, WORD PTR [SI] ; get interrupt number & data DEC SI ; point to patch SUB AL, 34h ; 34..3D --> 0..9 CMP AL, 9 ; interupt valid (between 0..9) ? JA $invald ; invalid interrupt JE $fwait ; interrupt $3D --> FWAIT CMP AL, 8 ; interrupt $3C ? JE $spcial ; yes, handle segment overrides ADD AL, 0D8h ; new opcode $tst286:MOV AH, AL ; second byte of opcode MOV AL, 90h ; first byte is a nop PUSH SP ; test if POP BP ; 286 or CMP SP, BP ; higher JE $patch ; 286 MOV AL, 9Bh ; convert nop to wait $patch: MOV WORD PTR [SI], AX ; store new opcode MOV BP, SP ; address stack via BP MOV WORD PTR [BP+8],SI; set new return address $endptc:POP DS ; restore POP SI ; destroyed POP AX ; registers POP BP ; restore TURBO-Pascal frameptr IRET ; done $fwait: MOV AX, 9B90h ; store FWAIT JMPS $patch ; patch it in $spcial:TEST AH, 20h ; bit 5 set indicates spec. func. JNZ $invald ; not supported, invalid MOV AL, AH ; generate AND AX, 07C0h ; segment SHR AL, 1 ; override SHR AL, 1 ; byte SHR AL, 1 ; and XOR AL, 18h ; coprocessor ADD AX, 0D826h ; opcode MOV BYTE PTR [SI+2],AH; set new opcode JMPS $tst286 ; put in new opcode $invald:JMPS $endptc ; no error handling, ignore PATCH87 ENDP CODE ENDS END Another optimization could be performed if a program is compiled in the $N+,E- mode. Since no emulator is used anyhow, the compiler could give up generating emulator interrupts and generate real coprocessors instructions instead. On CPUs > 286 neither NOPs nor WAITs had to be inserted before NDP instructions. This would save space as well as time. Those functions that use the Borland shortcut interrupt 3Eh could test which NDP is present whenever this interrupt is called. If Test8087 = 3, the enhanced instructions (e.g. FSIN, FCOS) available on the 387/486/287XL could be executed. There would be only minimum timing overhead, but vast performance improvements on 386/486 machines. Since no elaborate argument reduction schemes are necessary, the additional code would be quite short. The Borland shortcut interrupt provides some functions not accessible from Turbo-Pascal 6.0. These functions are the tangent Tan (subcode F0h), the dyadic logarithm Ld (subcode F6h), the common logarithm Log (subcode F8h), power of two (subcode FCh), and power of ten (subcode FEh). Tests show that these undocumented functions are provided with a coprocessor as well as with the emulator and are fully operational. These functions should be made available to programmers through the SYSTEM unit and be documented. Especially the Tan is quite useful since it only takes 40% of the time of the equivalent construct Sin/Cos. 2. Inclusion of LOADALL in inline assembler Since the undocumented AAM xx and AAD xx instructions are provided by the inline assembler, the undocumented LOADALL instruction (opcode 0F05h) could be provided as well when the compiler is in $G+ mode. The Turbo-Debugger will correctly disassemble LOADALL. 3. Suggestions regarding 286 code generation feature ($G+) Programs compiled with the $G+ switch will have reduced memory requirements and will execute somewhat faster on a 286/386/486 CPU. Typically, memory and time savings will not execeed 2%. Additionally, setting the $G switch on will allow the use of real and protected mode 286 instructions. As explained in section five of the README file, programs compiled with $G+ will not check for the presence of a approriate processor at runtime. It is strongly recommended that this behavior be changed. At least two cases are known (one involving Borlands biggest competitor) where programs were shipped that had been compiled with an 286 switch setting. Customers using them on PC type machines were puzzled when they discovered that programs crashed on their systems although they had performed flawlessly on their office computer. Finally someone found the bug by tracing the program with a debugger. To avoid such unpleasant confusion, programs compiled with $G+ should execute a short routine at startup to determine if an 286 or later processor is present. If this is not the case, it should emit an error message and abort the program, just as programs compiled with $N+,$E- abort if they fail to detect a coprocessor. Since 286 real mode instructions can also be executed on NEC's V20/V30 processors and on the 80186/188, it might be desirable to have an 186 code generation feature. This would effectively split the $G switch into two separate switches. No changes would have to be made to the code generator, since it generates no 286 protected mode instructions. Thus, generated code would be the same with either the 186 and 286 switches on. However, the inline assembler would only recognize protected mode instructions when the 286 switch is on. This would allow maximum utilization of the 286 real mode instructions and a run time check for the CPU at the same time. Below is some code that can be used to distinguish between 8086/8088, 80188/186/V20/V30, and 80286/386/486. ;-------------------------------------------------------------------- ; CPU_Test distinguishes between three groups of CPUs commonly used ; in computers and returns an associated code for each. ; ; OUTPUT: AX = 0 Group #0 may execute 8086 code only (8086/8088) ; AX = 1 Group #1 may additionally execute 286 real mode ; instructions (V20/V30, 80186/80188) ; AX = 2 Group #2 may additionally execute 286 protected : mode instructions ;-------------------------------------------------------------------- CPU_Test PROC FAR PUSH SP ; test updating POP AX ; of stackpointer CMP AX, SP ; stackpointer updated before push ? JE @Grp2 ; no, must be 286, 386 or 486 CLC ; make sure carry clear PUSHA ; PUSHA executed on 88/86 as JMP $+2 STC ; carry set if V20/V30 or 186/188 @8086: JC @Grp1 ; yes, its group #1 XOR AX, AX ; CPU is 8088/8086 RET ; done @Grp1: POPA ; remove pushed bytes MOV AX, 1 ; CPU is V20/V30 or 80186/80188 RET ; done @Grp2: MOV AX, 2 ; CPU is 286/386/486 RET ; done CPU_Test ENDP 4. Suggestions for enhancements in the code generator 4.1 Enhancing procedure entry/exit code in $G+ mode (286 code generation) When a procedure/function does not use local variables, the standard exit code in $G- mode is: POP BP RET This is replaced by the following code in $G+ mode: LEAVE RET However, for procedures/function that have no local variables, it would be advantageous to always use the first sequence in either mode, $G- and $G+. Although both sequences take the same number of clock cylces on 286 and 386 processors, the first is considerably faster on the 486. Since the code generator already checks if no local variables are declared to generate optimized entry code in $G+ mode, the optimized exit code could be generated just as easily. Although the use of the ENTER imm16, 0 instruction does produce shorter code when a procedure/function has both, parameters and local variables, the equivalent but longer (two or three byte more) standard procedure entry code will execute faster than ENTER on all Intel processors. Therefore, it should be considered if it is really desirable to use ENTER at all. A lot of programs really do run slower on a 386DX machine if compiled with $G+ instead of $G-, as tests indicate. processor | ENTER imm16, 0 | standard entry sequence -----------+--------------------+------------------------ 80286 | 11 clocks | 3 + 2 + 3 = 8 clocks 80386 | 10 clocks | 5 + 2 + 2 = 9 clocks 80486 | 14 clocks | 1 + 2 + 1 = 4 clocks 4.2 Optimizing entry code for non nested procedures without parameters and local variables If a procedure/function takes neither any parameters nor declares any local variables and is not statically nested within another procedure/function, there is no need for any entry code. Turbo Pascal performs this optimization only for assembler procedures, but skips it for normal procedures, probably so that nested and non-nested procedures can use the same branch of the code generator. The code generator could be enhanced to generate procedure entry code only for those procedures that are either statically nested (and thus have a hidden parameter, namely the framepointer of the preceding procedure in the static chain), take parameters, or declare local variables. 5. Suggestions for IDE The status line for the edit mode should be enhanced to include the shortcuts F5 Zoom and F6 Next. These additional hints will exactly fit into the remaining space. When IDE is in the stepping/debugging mode, shortcuts F4 Goto Cursor and Ctrl-F9 Run should be added to the status line. This would accelerate debugging sessions, since all program flow control could be excerted using simple mouse clicks on the status line. 6. Suggestion regarding TURBO command-line options There should be a help switch like /? or /Help on the Turbo-Pascal Prorammer's Platform command line that displays a help screen which describes the other command-line switches that are available and explains what they will do.