LP64 Model Implications for Assembly Language Code

LP64 Model Implications for Assembly Language Code

Four implications to writing assembly language code for LP64 are:

The first deals with different register sizes as explained in "Different Register Sizes."
The second requires you to use a different subroutine linkage convention as explained in "Using a Different Subrouting Linkage."
The third requires you to use a different convention to save the global pointer register ($gp) as explained in "Caller $gp (o32) vs. Callee Saved $gp (LP64)."
The fourth restricts your use of lwc1 instructions to access floating point register pairs but allows you to use more floating point registers as described in "Using More Floating Point Registers."

Different Register Sizes

The MIPSpro 64-bit C compiler generates code in the LP64 model -- that is, pointers and longs are 64 bits, ints are 32 bits. This means that all assembler code which deals with either pointers or longs needs to be converted to using doubleword instructions for MipsIII/IV code, and must continue to use word instructions for MipsI/II code.

Macros in <sys/asm.h>, coupled with the compiler predefines, provide a solution to this problem. These macros look like PTR_<op> or LONG_<op>, where op is some operation such as L for load, or ADD, etc.. These ops use standard defines such as _MIPS_SZPTR to resolve to doubleword opcodes for MIPS3, and word opcodes for MIPS1. There are specific macros for PTR ops, for LONG ops, and for INT ops.

Using a Different Subrouting Linkage

The second implication of LP64 is that there is a different subroutine linkage convention, and a different register naming convention. The compiler predefine _MIPS_SIM enables macros in <sys/asm.h> and <sys/regdef.h> Some important ramifications of that linkage convention are described below.

In the _MIPS_SIM_ABI64 model there are 8 argument registers - $4 .. $11. These additional 4 argument registers come at the expense of the temp registers in <sys/regdef.h>. In this model, there are no registers t4 .. t7, so any code using these registers does not compile under this model. Similarly, the register names a4 .. a7 are not available under the _MIPS_SIM_ABI32 model. (It should be pointed out that those temporary registers are not lost -- the argument registers can serve as scratch registers also, with certain constraints.)

To make it easier to convert assembler code, the new names ta0, ta1, ta2, and ta3 are available under both _MIPS_SIM models. These alias with t4 .. t7 in the 32-bit world, and with a4 ..a7 in the 64-bit world.

Another facet of the linkage convention is that the caller no longer has to reserve space for a called function to store its arguments in. The called routine allocates space for storing its arguments on its own stack, if desired. The NARGSAVE define in <sys/asm.h> helps with this.

Caller $gp (o32) vs. Callee Saved $gp (LP64)

The $gp register is used to point to the Global Offset Table (GOT). The GOT stores addresses of subroutines and static data for runtime linking. Since each DSO has its own GOT, the $gp register must be saved across function calls. Two conventions are used to save the $gp register.

Under the first convention, called caller saved $gp, each time a function call is made, the calling routine saves the $gp and then restores it after the called function returns. To facilitate this two assembly language pseudo instructions are used. The first, .cpload, is used at the beginning of a function and sets up the $gp with the correct value. The second, .cprestore, saves the value of $gp on the stack at an offset specified by the user. It also causes the assembler to emit code to restore $gp after each call to a subroutine.

The formats for correct usage of the .cpload and .cprestore instructions are shown below:

.cpload reg: reg is t9 by convention
.cprestore offset: offset refers to the stack offset where $gp is saved

Under the second convention, called callee saved $gp, the responsibility for saving the $gp register is placed on the called function. As a result, the called function needs to save the $gp register when it first starts executing. It must also restore it, just before it returns. To accomplish this the .cpsetup pseudo assembly language instruction is used. Its usage is shown below:

.cpsetup reg, offset, proc_name: reg is t9 by convention
offset refers to the stack offset where $gp is saved
proc_name refers to the name of the subroutine

You must create a stack frame by subtracting the appropriate value from the $sp register before using the directives which save the $gp on the stack.

In order to facilitate writing assembly language code for both conventions several macros have been defined in <sys/asm.h>. The macros SETUP_GP, SETUP_GPX, SETUP_GP_L, and SAVE_GP are defined under o32 and provide the necessary functionality to support a caller saved $gp environment. Under LP64, these macros are null. However, SETUP_GP64, SETUP_GPX64, SETUP_GPX64_L, and RESTORE_GP64 provide the functionality to support a callee saved environment. These same macros are null for o32.

In conclusion, predefines from the compiler enable a body of macros to generate 32/64-bit asm code. Those macros are defined in <sys/asm.h>, <sys/regdef.h>, and <sys/fpregdef.h>

The following example handles assembly language coding issues for LP64 and KPIC (KPIC requires that the asm coder deals with PIC issues). It creates a template for the start and end of a generic assembly language routine.

The template is followed by relevant defines and macros from <sys/asm.h>.

 LOCALSZ=        4               # save a0, a1, ra, gp
 FRAMESZ=        (((NARGSAVE+LOCALSZ)*SZREG)+ALSZ)&ALMASK
 RAOFF=          FRAMESZ-(1*SZREG)
 A0OFF=          FRAMESZ-(2*SZREG)
 A1OFF=          FRAMESZ-(3*SZREG)
 GPOFF=          FRAMESZ-(4*SZREG)

 NESTED(asmfunc,FRAMESZ,ra)
        move t0, gp   # save entering gp
                      # SIM_ABI64 has gp callee save
                      # no harm for SIM_ABI32
        SETUP_GPX(t8)
        PTR_SUBU sp,FRAMESZ
        SETUP_GP64(GPOFF,_sigsetjmp)
        SAVE_GP(GPOFF)
/* Save registers as needed here */
        REG_S ra,RAOFF(sp)
        REG_S a0,A0OFF(sp)
        REG_S a1,A1OFF(sp)
        REG_S t0,T0OFF(sp)

/* do real work here */
/* safe to call other functions */

/* restore saved regsisters as needed here */
        REG_L ra,RAOFF(sp)
        REG_L a0,A0OFF(sp)
        REG_L a1,A1OFF(sp)
        REG_L t0,T0OFF(sp)

/* setup return address, $gp and stack pointer */
REG_L    ra,RAOFF(sp)
RESTORE_GP64
PTR_ADDU sp,FRAMESZ

        bne      v0,zero,err
        j        ra

        END(asmfunc)

The .cpload/.cprestore is only used for generating KPIC code -- and tells the assembler to initialize, save, and restore the gp.

The following are relevant parts of asm.h:

#if (_MIPS_SIM == _MIPS_SIM_ABI32)
#define NARGSAVE        4       
#define ALSZ            7       
#define ALMASK          ~7
#endif
#if (_MIPS_SIM == _MIPS_SIM_ABI64)
#define NARGSAVE        0       
#define ALSZ            15      
#define ALMASK          ~0xf
#endif

#if (_MIPS_ISA == _MIPS_ISA_MIPS1 || _MIPS_ISA ==_MIPS_ISA_MIPS2)
#define SZREG           4
#endif

#if (_MIPS_ISA == _MIPS_ISA_MIPS3 || _MIPS_ISA == _MIPS_ISA_MIPS4)
#define SZREG           8
#endif

#if (_MIPS_ISA == _MIPS_ISA_MIPS1 || _MIPS_ISA == _MIPS_ISA_MIPS2)
#define REG_L   lw
#define REG_S   sw
#endif

#if (_MIPS_ISA == _MIPS_ISA_MIPS3 || _MIPS_ISA == _MIPS_ISA_MIPS4)
#define REG_L   ld
#define REG_S   sd
#endif

#if (_MIPS_SZINT == 32)
#define INT_L   lw
#define INT_S   sw
#define INT_LLEFT       lwl
#define INT_SLEFT       swl
#define INT_LRIGHT      lwr
#define INT_SRIGHT      swr
#define INT_ADD         add
#define INT_ADDI        addi
#define INT_ADDIU       addiu
#define INT_ADDU        addu
#define INT_SUB         sub
#define INT_SUBI        subi
#define INT_SUBIU       subiu
#define INT_SUBU        subu
#define INT_LL          ll
#define INT_SC          sc
#endif

#if (_MIPS_SZINT == 64)
#define INT_L   ld
#define INT_S   sd
#define INT_LLEFT       ldl     
#define INT_SLEFT       sdl     
#define INT_LRIGHT      ldr     
#define INT_SRIGHT      sdr     
#define INT_ADD         dadd
#define INT_ADDI        daddi
#define INT_ADDIU       daddiu
#define INT_ADDU        daddu
#define INT_SUB         dsub
#define INT_SUBI        dsubi
#define INT_SUBIU       dsubiu
#define INT_SUBU        dsubu
#define INT_LL          lld
#define INT_SC          scd
#endif

#if (_MIPS_SZLONG == 32)
#define LONG_L  lw
#define LONG_S  sw
#define LONG_LLEFT      lwl     
#define LONG_SLEFT      swl     
#define LONG_LRIGHT     lwr     
#define LONG_SRIGHT     swr     
#define LONG_ADD        add
#define LONG_ADDI       addi
#define LONG_ADDIU      addiu
#define LONG_ADDU       addu
#define LONG_SUB        sub
#define LONG_SUBI       subi
#define LONG_SUBIU      subiu
#define LONG_SUBU       subu
#define LONG_LL         ll
 #define LONG_SC         sc
 #endif

 #if (_MIPS_SZLONG == 64)
 #define LONG_L  ld
 #define LONG_S  sd
 #define LONG_LLEFT      ldl     
 #define LONG_SLEFT      sdl     
 #define LONG_LRIGHT     ldr     
 #define LONG_SRIGHT     sdr     
 #define LONG_ADD        dadd
 #define LONG_ADDI       daddi
 #define LONG_ADDIU      daddiu
 #define LONG_ADDU       daddu
 #define LONG_SUB        dsub
 #define LONG_SUBI       dsubi
 #define LONG_SUBIU      dsubiu
 #define LONG_SUBU       dsubu
 #define LONG_LL         lld
 #define LONG_SC         scd
 #endif
 
 #if (_MIPS_SZPTR == 32)
 #define PTR_L   lw
 #define PTR_S   sw
 #define PTR_LLEFT       lwl     
 #define PTR_SLEFT       swl     
 #define PTR_LRIGHT      lwr     
 #define PTR_SRIGHT      swr     
 #define PTR_ADD         add
 #define PTR_ADDI        addi
 #define PTR_ADDIU       addiu
 #define PTR_ADDU        addu
 #define PTR_SUB         sub
 #define PTR_SUBI        subi
 #define PTR_SUBIU       subiu
 #define PTR_SUBU        subu
 #define PTR_LL          ll
 #define PTR_SC          sc
 #endif

 #if (_MIPS_SZPTR == 64)
 #define PTR_L   ld
 #define PTR_S   sd
 #define PTR_LLEFT       ldl     
 #define PTR_SLEFT       sdl     
 #define PTR_LRIGHT      ldr     
 #define PTR_SRIGHT      sdr     
 #define PTR_ADD         dadd
 #define PTR_ADDI        daddi
 #define PTR_ADDIU       daddiu
 #define PTR_ADDU        daddu
 #define PTR_SUB         dsub
 #define PTR_SUBI        dsubi
 #define PTR_SUBIU       dsubiu
 #define PTR_SUBU        dsubu
 #define PTR_LL          lld
 #define PTR_SC          scd
 #endif

Using More Floating Point Registers

On the R4000 and later generation MIPS microprocessors, the FPU provides:

16 64-bit Floating Point registers (FPRs) each made up of a pair of 32-bit floating point general purpose register when the FR bit in the Status register equals 0, or
32 64-bit Floating Point registers (FPRs) each corresponding to a 64-bit floating point general purpose register when the FR bit in the Status register equals 1

For more information about the FPU of the R4000 refer to Chapter 6 of the MIPS R4000 User's Manual.

Under o32, the FR bit is set to 0. As a result, o32 provides only 16 registers for double precision calculations. Under o32, double precision instructions must refer to the even numbered floating point general purpose register. A major implication of this is that code written for the MIPS I instruction set treated a double precision floating point register as an odd and even pair of single precision floating point registers. It would typically use sequences of the following instructions to load and store double precision registers.

lwc1 $f4, 4(a0)
lwc1 $f5, 0(a0)
... 
swc1 $f4, 4(t0)
swc1 $f5, 0(t0)

Under LP64, however, the FR bit is set to 1. As a result, LP64 provides all 32 floating point general purpose registers for double precision calculations. Since $f4 and $f5 refer to different double precision registers, the code sequence above will not work under LP64. It can be replaced with the following:

l.d $f14, 0(a0)
...
s.d $f14, 0(t0)

The assembler will automatically generate pairs of LWC1 instructions for MIPS I and use the LDC1 instruction for MIPS II and above.

On the other hand, you can use these additional odd numbered registers to improve performance of double precision code.