home *** CD-ROM | disk | FTP | other *** search
- DDOPT(1) Last changed: 1-4-99
-
-
- NNAAMMEE
- ddddoopptt - MIPS Data-Dependency-based Optimizer
-
- SSYYNNOOPPSSIISS
- ddddoopptt _u_n_o_p_t__f_i_l_e _o_p_t__f_i_l_e [ --vv --mmiippss33 --hhoossttccaacchhee --ccaacchheesszz "" ssiizzee"" ]
-
- IIMMPPLLEEMMEENNTTAATTIIOONN
- IRIX systems
-
- DDEESSCCRRIIPPTTIIOONN
- ddddoopptt,, the MIPS data-dependency-based optimizer, reads the input
- binary ucode file on a procedure by procedure basis, performs loop-
- based transformations on each outer-most loop nest in each procedure
- and outputs the optimized binary ucode file. By convention, it takes
- a binary ucode file with the extensions .B or .M as input and output a
- binary ucode file with the extension .D. In the compilation process,
- ddddoopptt runs after the front-end, after uulldd and uusspplliitt, and before
- uummeerrggee, uuoopptt, and uuggeenn. Currently, ddddoopptt only takes ucode files
- generated from FORTRAN.
-
- ddddoopptt borrows optimization techniques that originated from compilers
- for supercomputers and adapts them to apply to scalar machines. It
- performs high-level analysis on the behavior of array accesses in
- loops, deriving what we call data dependency information. Numerous
- optimization transformations on the program code are performed based
- on such information (and thus the name ddddoopptt )).. The transformations
- are invariantly associated with program loops that operate on arrays.
-
- There are different kinds of transformations performed by ddddoopptt that
- benefit program performance:
-
- 1. Those that reduce memory references. Techniques include re-using
- array references that have been allocated to register (register
- allocation for array references) and moving array references and
- assignments outside loops.
-
- 2. Those that improve locality of memory references (thus reducing
- data cache misses). Techniques include changing the order of loop
- nests (loop interchange) and partitioning loop iterations to
- operate on smaller sections of array (strip-mining).
-
- 3. Those that reduce floating-point interlocks and promote greater
- parallelism among floating-point operations by promoting larger
- pieces of straight-line code in loops. Techniques include
- unrolling and unrolling-and-jam (unroll outer loop and jam the
- resulting copies of the inner loop into one bigger loop).
-
- There are other optimizations that ddddoopptt does just to bring in more
- opportunities for doing the above transformations: local common
- subexpression, secondary index variable elimination, constant
- propagation, copy propagation, constant folding, jump folding and dead
- code elimination. Some of these optimizations duplicate the
- optimizations performed in uuoopptt. These optimizations are applied
- iteratively until there is no more change to the code, and they
- precedes the data-dependency-based analyses and transformations.
-
- The following options are interpreted by ddddoopptt. Options starting with
- --XX are not recognized by the compiler driver, and have to be passed to
- ddddoopptt via --WWdd,,.......
-
- --vv Turns on verbose mode. In this mode, ddddoopptt will print the
- name of the procedure it is currently optimizing.
-
- --mmiippss33 Tells ddddoopptt that the target machine uses the MIPS3 instruction
- set.
-
- --hhoossttccaacchhee
- Tells ddddoopptt to assume that the target machine has the same
- data cache size as the host machine, so it can find out the
- cache size via system call.
-
- --ccaacchheesszz "" _s_i_z_e""
- Gives ddddoopptt the data cache size of the target machine, in
- bytes. The default is 8192 bytes.
-
- --XXbbllddggrr Dumps the data dependency information computed, for debugging
- purpose.
-
- --XXbbbbooppttooffff
- Turns off the conventional global optimizations that precede
- the data-dependency-related transformations.
-
- --XXbbff "" _s_i_z_e""
- Changes the blocking factor used by ddddoopptt in strip-mining.
- The default is 36 bytes.
-
- --XXdduummpp Tells ddddoopptt to dump the original and transformed program in a
- compact, close-to-source-level format.
-
- --XXddoossiizzeetthhrreesshhoolldd "" _c_o_u_n_t""
- If the number of statements in a DO loop exceeds this number,
- that DO loop is excluded from transformation by ddddoopptt.. The
- default is 150.
-
- --XXggccooppyyooffff
- Turns off global copy propagation.
-
- --XXiinntteerrooffff
- Turns off loop interchange.
-
- --XXiinnddeepprreeggooffff
- Turns off loop-independent dependence register allocation.
-
- --XXiinnppuuttrreeggooffff
- Turns off input dependence register allocation.
-
- --XXiinnvvaarrrreeggooffff
- Turns off loop-invariant register allocation.
-
- --XXllccooppyyooffff
- Turns off local copy propagation.
-
- --XXmmeerrggeeppiibblloocckkooffff
- Disallows the merging of pi-blocks created for statements in
- the same basic blocks.
-
- --XXmmoorreeuunnrroolllljjaamm
- By default, unroll-and-jam are performed only on inner loop
- nests that come out of strip-mining. This flag removes this
- restriction and tells ddddoopptt to do unroll-and-jam whenever it
- is advantageous.
-
- --XXmmaaxx__iinntt__rreeggss
- Tells ddddoopptt the number of integer registers available in the
- underlying machine. The default is 32.
-
- --XXmmaaxx__ffllooaatt__rreeggss
- Tells ddddoopptt the number of floating-point registers available
- in the underlying machine. The default is 16.
-
- --XXooffffffoooo
- Turns off all transformation for the given procedure name
- (ffoooo, in this case).
-
- --XXoouuttppuuttrreeggooffff
- Turns off output dependence register allocation.
-
- --XXoovveerraallllooccaattee
- Tells ddddoopptt to perform register allocation without regard to
- the number of registers available in the underlying machine.
-
- --XXssttrriippooffff
- Turns off strip-mining.
-
- --XXssttrriippoonnllyy
- Tells ddddoopptt to perform strip-mining but prevent the newly-
- formed loops from being interchanged into a deeper region of
- the loop nest, for debugging purpose only.
-
- --XXssttaatt Prints optimization statistics to give line numbers and number
- of times various transformations were applied.
-
- --XXttrruueerreeggooffff
- Turns off true dependence register allocation.
-
- --XXuunnrroollllooffff
- Turns off loop unrolling.
-
- --XXuunnrroolllljjaammooffff
- Turns off unroll-and-jam.
-
- --XXuunnrroolllltthhrreesshhoolldd "" ccoouunntt""
- Sets the threshold that limits the extent to which unrolling
- can be performed without causing the number of statements in
- the loop to exceed this number. The default is 180.
-
- --XXuunnrroollllttiimmeess "" _c_o_u_n_t""
- Sets the maximum number of times to unroll a loop. The
- default is 4.
-
- DDIIAAGGNNOOSSTTIICCSS
- ddddoopptt assumes the input ucode file is error-free.
-
- SSEEEE AALLSSOO
- uuccooddee(1), uuoopptt(1), bbttoouu(1), ppppuu(1)
-
- This man page is available only online.
-
-