Sizes of instrs

Now let's consider the correspondence between our [[instr]] type and the actual MIPS instructions we intend to emit. One important problem to solve is figuring out how big things are, so that we know what addresses to generate for the various labels. We will also want to know what address is currently stored in the program counter regsiter ([[pcreg]]), because we'll need to know when something is close enough that we can use a sixteen-bit address relative to that register. The kind of address we can use will determine how big things are. We'll rearrange the code so that we have a list of [[ref int * instr]] pairs, where the [[ref int]] stores the position in the list. (Positions start at zero.) Since in the MIPS all instructions are the same size, we measure position as number of instructions. While we're at it, we reverse the list so that the head will execute first, then the rest of the list. We begin with each position set to zero, and make a pass over the list trying to set the value of each position. We do this by estimating the size of (number of MIPS instructions generated for) each [[instr]]. Since there are forward references, we may not have all the distances right the first time, so we have to make a second pass. But during this second pass we could find that something is further away than we thought, and we have to switch from using a pc-relative mode to something else (or maybe grab the new pc?), which changes the size again, and moves things even further away. Because we can't control this process, we just keep making passes over the list until the process quiesces (we get the same size twice). In order to guarantee termination, we have to make sure later passes only increase the sizes of things. This is sufficient since there is a maximum number of MIPS instructions we can generate for each [[instr]]. While we're at it, we might want to complicate things by making the function that does the passes also emit code. For a single pass we hand an optional triple of emitters, the initial position, an [[int option]] for the program counter pointer (if known), and the instructions. I'm not sure what explains the use of the [[ref int]] to track the position, instead of just an [[int]]—it might be a desire to avoid the overhead of creating a bunch of new objects, or it might be really hard to do the passes cheaply. It should think a variation on [[map]] would do the job, but maybe I'm missing something. @ [[emit : int * int -> unit]] emits one instruction, and [[emit_string : int -> string -> unit]] emits a string constant. [[emit_string]] could be specified as a function of [[emit]], but the nature of the function would depend on whether the target machine was little-endian or big-endian, and we don't want to have that dependency built in. [[instrs]] is the list of instructions (in execute-head-last order). The second argument to [[pass]] indicates for what instructions code is to be generated. It is a record (position of next instruction, program counter pointer if any, remaining instructions to generate [with positions]). [[prepare]] produces two results: the instruction stream with size pointers added, and the total size of code to be generated. We add the total size because that is the only way to find the number of [[bltzal]]s, which are implicit in the instruction stream. «assembler»= fun prepare instrs = let fun add_positions(done, inst::rest) = add_positions( (ref 0, inst) :: done, rest) | add_positions(done, nil) = done val instrs' = add_positions(nil, instrs) (* reverse and add [[ref int]]s*) fun passes(oldsize) = (* make passes with no emission until size is stable*) let val size = pass false (0,NONE,instrs') in if size=oldsize then size else passes size end in size = passes 0, stream = instrs' end fun assemble instrs = pass true (0,NONE,#stream (prepare instrs)) «functions that assemble [[instr]]s into code»= fun get (SOME x) = x | get NONE = ErrorMsg.impossible "missing pcptr in mipscoder" «[[pcptr]] functions» «single pass» «assembler» fun codegen () = ( assemble (!kept); «reinitialize [[kept]]» ) @ The program counter pointer is a device that enables us to to addressing relative to the pcp register, register 31. The need for it arises when we want to access a data element which we know only by its label. The labels give us addresses relative to the beginning of the function, but we can only use addresses relative to some register. The answer is to set register 31 with a [[bltzal]] instruction, then use that for addressing. The function [[needs_a_pcptr]] determines when it is necessary to have a known value in register 31. That is, we need the program counter pointer «[[pcptr]] functions»= fun needs_a_pcptr(_,SLT(_,Immedlab _,_)) = true | needs_a_pcptr(_,ADD(_,Immedlab _,_)) = true | needs_a_pcptr(_,AND(_,Immedlab _,_)) = true | needs_a_pcptr(_,OR(_,Immedlab _,_)) = true | needs_a_pcptr(_,XOR(_,Immedlab _,_)) = true | needs_a_pcptr(_,MOVE(Immedlab _,_)) = true | needs_a_pcptr(_,LOAD(_,_,Immedlab _,_)) = true | needs_a_pcptr(_,STORE(_,_,Immedlab _,_)) = true | needs_a_pcptr(_,SLL(Immedlab _,_,_)) = true | needs_a_pcptr(_,SRA(Immedlab _,_,_)) = true | needs_a_pcptr(1, BEQ _) = false (* small BEQ's dont need pcptr *) | needs_a_pcptr(_, BEQ _) = true (* but large ones do *) | needs_a_pcptr(1, BCOP1 _) = false (* small BCOP1's dont need pcptr *) | needs_a_pcptr(_, BCOP1 _) = true (* but large ones do *) | needs_a_pcptr _ = false @ Creating the program counter pointer once, with a [[bltzal]], is not enough; we have to invalidate the program counter pointer at every label, since control could arrive at the label from God knows where, and therefore we don't know what the program counter pointer is. We use the function [[makepcptr]] to create a new program counter pointer ``on the fly'' while generating code for other [[instrs]]. (I chose not to create a special [[instr]] for [[bltzal]], which I could have inserted at appropriate points in the instruction stream.) To try and find an odd bug, I'm adding no-ops after each [[bltzal]]. I don't really believe they're necessary. The function [[gen]], which generates the instructions (or computes their size), takes three arguments. Third: the list of instructions to be generated (paired with pointers to their sizes); first: the position (in words) at which to generate those instructions; second: the current value of the program counter pointer (register 31), if known. The mutual recursion between [[gen]] and [[makepcptr]] maintains the program counter pointer. [[gen]] invalidates it at labels, and calls [[makepcptr]] to create a valid one when necessary (as determined by [[needs_a_pcptr]]). «single pass»= fun pass emit_now = let fun makepcptr(i,x) = (* may need to emit NOP for delay slot if next instr is branch *) let val size = case x of ((_,BEQ _)::rest) => 2 | ((_,BCOP1 _)::rest) => 2 | _ => 1 in if emit_now then (emit(Opcodes.bltzal(0,0)); if size=2 then emit(Opcodes.add(0,0,0)) else ()) else (); gen(i+size, SOME (i+2), x) end and gen(i,_,nil) = i | gen(i, _, (_,DEFINE lab) :: rest) = (lab := i; gen(i,NONE, rest)) (* invalidate the pc pointer at labels *) (* may want to do special fiddling with NOPs *) | gen(pos, pcptr, x as ((sizeref as ref size, inst) :: rest)) = if (pcptr=NONE andalso needs_a_pcptr(size, inst)) then makepcptr(pos,x) else if emit_now then «emit MIPS instructions» else «compute positions» in gen end @