Next | Prev | Up | Top | Contents | Index

Looking at the Code Produced by Software Pipelining

The proper way to look at the assembly code generated by software pipelining is to use the -S compiler switch. This is vastly superior to using the disassembler because the -S switch adds annotations to the assembly code which name out the sections described above. The annotations also provide useful statistics about the software pipelining process as well as reasons why certain code did not pipeline. To get a summary of these annotations do the following:

%f77 -64 -S -O3 -mips4 foo.f

This creates an annotated .s file

%grep '#<swp' foo.s

#<swpf is printed for loops that failed to software pipeline. #<swps is printed for statistics and other info about the loops that did software pipeline.

Another way to get a summary of the software pipelining annotations is to set the -LIST:=ON flag on the command line. For example:

%f77 -64 -O3 -mips4 -LIST:=ON foo.f

This creates a .L file which contains a summary of the flags used by the compiler (including default values) and the software pipelining annotations.


Example 1: Output from Using the -S Compiler Switch

%cat test.f

program test

real*8 a x(100000),y(100000)

do i = 1, 2000

call daxpy(3.7, x, y, 100000)

enddo

stop

end

subroutine daxpy(a, x, y, nn)

real*8 a x(*),y(*)

do i = 1, nn, 1

y(i) = y(i) + a * x(i)

enddo

return

end

%f77 -64 -mips4 -O3 -S test.f

%grep swps test.s

#<swps>

#<swps> Pipelined loop line 11 steady state

#<swps>

#<swps> 4 unrollings before pipelining

#<swps> 6 cycles per 4 iterations

#<swps> 8 flops ( 33% of peak) (madds count as 2

#<swps> 4 flops ( 33% of peak) (madds count as 1

#<swps> 4 madds ( 33% of peak)

#<swps> 12 mem refs ( 100% of peak)

#<swps> 2 integer ops ( 16% of peak)

#<swps> 18 instructions ( 75% of peak)

#<swps> 1 short trip threshold

#<swps> 7 ireg registers used

#<swps> 11 fgr registers used

#<swps>

This shows that the inner loop starting at line 11 was software pipelined. The loop was unrolled four times before pipelining. It used 6 cycles for every four loop iterations and calculated the statistics as follows:


Next | Prev | Up | Top | Contents | Index