Performance Mode Definition

Performance Mode Definition

In defining a new floating-point execution environment there are several goals:

Give sufficient latitude to facilitate the design of all conceivable future high performance processors.
Fully comply with the IEEE Standard via a combination of compiler, library, operating system and hardware.
Preserve the correct operation of a broad subset of existing applications compiled under the preexisting floating-point environment (which we now call Precise Exception Mode).
Provide a software-only solution to retrofit the new mode on existing hardware.

The first goal is important because we do not want to be changing floating-point architectures with every implementation. The second goal is important because we want to continue to say we have "IEEE arithmetic" machines. The third goal gives our customers a smooth transition path. The fourth goal lets our customers upgrade their old machines.

Performance mode is defined by omitting denormalized numbers from the IEEE Standard and by deleting the requirement to precisely trap floating-point exceptions. Referring to Table 5-2, the behavior of an operation that produces result values A-E in Performance Mode is defined as follows.

Operation Results Using Performance Mode
Value Input Result Flags
A: TooSmall - 0 or minN U=1, I=1
B: ExactDenorm 0 or min 0 or minN U=1, I=1
C: InexactDenorm - 0 or minN U=1, I=1
D: ExactNorm D D U=0, I=0
E: InexactNorm - rnd(E) U=0, I=1

Operation Results Using Performance Mode
Value	Input	Result	Flags
A: TooSmall	-	0 or minN	U=1, I=1
B: ExactDenorm	0 or min	0 or minN	U=1, I=1
C: InexactDenorm	-	0 or minN	U=1, I=1
D: ExactNorm	D	D	U=0, I=0
E: InexactNorm	-	rnd(E)	U=0, I=1

Tiny results are mapped to either zero or the minimum normalized number, depending on the current Rounding Mode. Note that the inexact flag I is set in case B because although there is an exact denormalized representation for that value, it is not being used. Denormalized input operands, B, are similarly mapped to zero or minN. Note that there are no inexact inputs since they cannot be represented. The normalized cases are identical to those in Precise Exception mode.

All IEEE Standard floating-point exceptions are trapped imprecisely in Performance Mode. Regardless of whether the exceptions are enabled or disabled, the result register specified by the offending instruction is unconditionally updated as if all the exceptions are disabled, and the exception conditions are accumulated into the flag bits of the FSR, the floating point control and status register.

There are two classes of exceptions in Performance Mode. If any flag bit (invalid operation, division by zero, overflow, underflow, inexact) and its corresponding enable bit are both set, then an imprecise trap occurs at or after the offending instruction up to the next trap barrier. In addition, if FS=0 (FS is a special control bit in the FSR) then an imprecise trap occurs when a tiny result that would be represented as a denormalized number gets mapped into zero or minN. FS=0 also causes an imprecise trap if an input operand is a denormalized number that gets trapped into zero or minN.

A floating-point trap barrier is defined by a code sequence that begins with an instruction moving the FSR to an integer register and concludes with an instruction that uses the integer register containing the FSR contents. Any number of other instructions are allowed in between as long as they are not floating-point computation instructions (that is, they cannot set flag bits). All imprecise floating-point traps that occur on behalf of an instruction before the barrier are guaranteed to have occurred before the conclusion of the barrier. At the conclusion of the barrier the flag bits accurately reflect the accumulated results of all floating point instructions before the barrier. The floating-point barrier is defined in this way to give implementations maximum flexibility in overlapping integer and floating-point operations serialization of the two units is deferred as late as possible to avoid performance loss.

The cause bits of the FSR present a serious problem in Performance Mode. Ideally they should contain the result of the latest floating-point operation. However, this may be very difficult or expensive to implement when floating-point instructions are issued or even completed out of order. In order to maximize the opportunity for correctly running existing binaries and yet retain full flexibility in future out-of-order implementations, the cause bits of the FSR are defined to be cleared by each floating-point operation. Future applications, however, should avoid looking at the cause bits, and instead should use the flag bits.

The selection of Performance or Precise Exception Mode is defined as a protected or kernel-only operation. This is necessary for several reasons. When executing existing binaries that operate correctly in Performance Mode, we do not want the program to accidently go into Precise Exception Mode. Since existing programs regularly clear the entire FSR when they want to clear just the rounding mode bits, Performance Mode cannot be indicated by setting a bit in the FSR. On the other hand, existing programs that must run in Precise Exception Mode must not accidently go into Performance Mode. Thus Performance Mode cannot be indicated by clearing a bit in the FSR either. We cannot use a new user-accessible floating-point control register to indicate Performance Mode because when a new program running on an existing processor that does not understand Performance Mode writes to this nonexisting control register, it is undefined what happens to the floating-point unit. Finally, on the R8000 there are implementation restrictions on what instructions may proceed and follow a mode change, so such changes can only be done safely by the kernel.