Many important loop preparation transformations involve reassociation of floating point values. See the discussion of floating point optimization above, especially the -OPT:roundoff option.
SWP normally must be careful during the initial and final iterations of a loop to not perform extra operations which might cause runtime traps. It must be similarly careful if early exits from a loop (that is, before the initially calculated trip count is reached) are possible. Turning off certain traps at runtime can give it more flexibility, producing better schedules and/or simpler wind-up/wind-down code. See the target environment option -TENV:X=n for general control over the exception environment.
DO i=1,n
a(i) = a(i-1) + 5.0
END DOWithout back-substitution, each iteration must wait for the previous iteration's add to complete, yielding a best case of 4 cycles per iteration on the R8000. Back-substitution can transform the loop to something like:
DO i=1,n
a[i] = a[i-8] + 40.0
END DOWith appropriate initialization, this version can achieve an effective rate of nearly two iterations per cycle.
Loop bodies are also normally unrolled in preparation for SWP. This also limits the unrolling, since loops are not unrolled to more than n instructions in the unrolled body. Unrolling is also constrained by the unroll_times_max option described below. (Unrolling of loop bodies not expected to be software pipelined is controlled separately by -OPT:unroll_size and -OPT:unroll_times_max.)
DO i=1,n
IF ( a(i) .LT. b(i) ) THEN
c(i) = a(i)
ELSE
c(i) = b(i)
END IF
END DOThe loop body can be compiled for MIPS4 as:
ldc1 ldc1 c.lt.s movf.s sdc1 | $f0,a(i) $f1,b(i) $fcc1,$f0,$f1 $f0,$f1,$fcc1 $f0,c(i) |
Note that there are no conditional branches in the code. This option is ON by default for MIPS4 targets only.
DO i=1,n
sum = sum + a(x)
END DOWithout interleaving, each iteration must wait for the previous iteration's add to complete, yielding a best case II of 4 cycles per iteration on the R8000. Interleaving can transform the loop to something equivalent to:
DO i=1,n,8
sum1 = sum1 + a(i)
sum2 = sum2 + a(i+1)
sum3 = sum3 + a(i+2)
sum4 = sum4 + a(i+3)
sum5 = sum5 + a(i+4)
sum6 = sum6 + a(i+5)
sum7 = sum7 + a(i+6)
sum8 = sum8 + a(i+7)
END DO sum = sum + sum1 + sum2 + sum3 + sum4 + sum5 + sum6 + sum7 + sum8This version can achieve an effective II of nearly 0.5 cycles.
These transformations generally require -OPT:roundoff=2 or better.
This option controls the maximum number of times inner loop bodies are unrolled before attempting pipelining. The default is 4 for MIPS4 and 1 for MIPS3. Unrolling is also constrainedby the body_ins_count_max option described above. (Unrolling of loop bodies not expected to be software pipelined is controlled separately by -OPT:unroll_size and -OPT:unroll_times_max.)