Now let's consider the correspondence between our [[instr]] type and the
actual MIPS instructions we intend to emit.
One important problem to solve is figuring out how big things are,
so that we know what addresses to generate for the various labels.
We will also want to know what address is currently stored in the program
counter regsiter ([[pcreg]]),
because we'll need to know when something is close
enough that we can use a sixteen-bit address relative to that register.
The kind of address we can use will determine how big things are.
We'll rearrange the code so that we have a list of [[ref int * instr]] pairs,
where the [[ref int]] stores the position in the list.
(Positions start at zero.)
Since in the MIPS all instructions are the same size, we measure
position as number of instructions.
While we're at it, we reverse the list so that the head will execute first,
then the rest of the list.
We begin with each position set to zero, and make a pass over the list
trying to set the value of each position.
We do this by estimating the size of (number of MIPS instructions
generated for) each [[instr]].
Since there are forward references, we may not have all the distances right
the first time, so we have to make a second pass.
But during this second pass we could find that something is further
away than we thought, and we have to switch from using a pc-relative mode to
something else (or maybe grab the new pc?), which changes the size again,
and moves things even further away.
Because we can't control this process, we just keep making passes over the
list until the process quiesces (we get the same size twice).
In order to guarantee termination, we have to make sure later passes only
increase the sizes of things.
This is sufficient since there is a maximum number of MIPS instructions
we can generate for each [[instr]].
While we're at it, we might want to complicate things by making the function
that does the passes also emit code.
For a single pass we hand an optional triple of emitters, the initial position,
an [[int option]] for the program counter pointer (if known), and the
instructions.
I'm not sure what explains the use of the [[ref int]] to track the position,
instead of just an [[int]]—it might be a desire to avoid the
overhead of creating a bunch of new objects, or it might be really hard
to do the passes cheaply.
It should think a variation on [[map]] would do the job, but maybe I'm
missing something.
@
[[emit : int * int -> unit]] emits one instruction,
and [[emit_string : int -> string -> unit]] emits a string constant.
[[emit_string]] could be specified as a function of [[emit]],
but the nature of the function would depend on whether the target
machine was little-endian or big-endian, and we don't want to have
that dependency built in.
[[instrs]] is the
list of instructions (in execute-head-last order).
The second argument to [[pass]] indicates for what instructions code
is to be generated.
It is a record (position of next instruction, program counter pointer if any,
remaining instructions to generate [with positions]).
[[prepare]] produces two results: the instruction stream with
size pointers added, and the total size of code to be generated.
We add the total size because that is the only way to find the number
of [[bltzal]]s, which are implicit in the instruction stream.
«assembler»=
fun prepare instrs =
let fun add_positions(done, inst::rest) =
add_positions( (ref 0, inst) :: done, rest)
| add_positions(done, nil) = done
val instrs' = add_positions(nil, instrs) (* reverse and add [[ref int]]s*)
fun passes(oldsize) =
(* make passes with no emission until size is stable*)
let val size = pass false (0,NONE,instrs')
in if size=oldsize then size
else passes size
end
in size = passes 0, stream = instrs'
end
fun assemble instrs =
pass true (0,NONE,#stream (prepare instrs))
«functions that assemble [[instr]]s into code»=
fun get (SOME x) = x
| get NONE = ErrorMsg.impossible "missing pcptr in mipscoder"
«[[pcptr]] functions»
«single pass»
«assembler»
fun codegen () = (
assemble (!kept);
«reinitialize [[kept]]»
)
@
The program counter pointer is a device that enables us to to addressing
relative to the pcp register, register 31.
The need for it arises when we want to access a data element which we know
only by its label.
The labels give us addresses relative to the beginning of the function,
but we can only use addresses relative to some register.
The answer is to set register 31 with a [[bltzal]] instruction,
then use that for addressing.
The function [[needs_a_pcptr]] determines when it is necessary
to have a known value in register 31.
That is, we need the program counter pointer
- at [[NOP]] for a reason to be named later?
- at any operation that uses an effective address that refers to a label
(since all labels have to be relative to the program counter).
- BEQ's and BCOP1's to very far away,
since we have to compute the address for a JUMP
knowing the value of the program counter pointer.
«[[pcptr]] functions»=
fun needs_a_pcptr(_,SLT(_,Immedlab _,_)) = true
| needs_a_pcptr(_,ADD(_,Immedlab _,_)) = true
| needs_a_pcptr(_,AND(_,Immedlab _,_)) = true
| needs_a_pcptr(_,OR(_,Immedlab _,_)) = true
| needs_a_pcptr(_,XOR(_,Immedlab _,_)) = true
| needs_a_pcptr(_,MOVE(Immedlab _,_)) = true
| needs_a_pcptr(_,LOAD(_,_,Immedlab _,_)) = true
| needs_a_pcptr(_,STORE(_,_,Immedlab _,_)) = true
| needs_a_pcptr(_,SLL(Immedlab _,_,_)) = true
| needs_a_pcptr(_,SRA(Immedlab _,_,_)) = true
| needs_a_pcptr(1, BEQ _) = false (* small BEQ's dont need pcptr *)
| needs_a_pcptr(_, BEQ _) = true (* but large ones do *)
| needs_a_pcptr(1, BCOP1 _) = false (* small BCOP1's dont need pcptr *)
| needs_a_pcptr(_, BCOP1 _) = true (* but large ones do *)
| needs_a_pcptr _ = false
@
Creating the program counter pointer once, with a [[bltzal]], is not
enough; we have to invalidate the program counter pointer at every
label, since control could arrive at the label from God knows where, and
therefore we don't know what the program counter pointer is.
We use the function [[makepcptr]] to create a new program counter pointer
``on the fly'' while generating code for other [[instrs]].
(I chose not to create a special [[instr]] for [[bltzal]], which I
could have inserted at appropriate points in the instruction stream.)
To try and find an odd bug, I'm adding no-ops after each [[bltzal]].
I don't really believe they're necessary.
The function [[gen]], which generates the instructions (or computes
their size), takes three arguments.
Third: the list of instructions to be generated (paired with pointers
to their sizes); first: the position (in words) at which to generate
those instructions; second: the current value of the program counter
pointer (register 31), if known.
The mutual recursion between [[gen]] and [[makepcptr]] maintains
the program counter pointer.
[[gen]] invalidates it at labels, and calls [[makepcptr]] to create a valid
one when necessary (as determined by [[needs_a_pcptr]]).
«single pass»=
fun pass emit_now =
let fun makepcptr(i,x) =
(* may need to emit NOP for delay slot if next instr is branch *)
let val size = case x of ((_,BEQ _)::rest) => 2
| ((_,BCOP1 _)::rest) => 2
| _ => 1
in if emit_now then (emit(Opcodes.bltzal(0,0));
if size=2 then emit(Opcodes.add(0,0,0)) else ())
else ();
gen(i+size, SOME (i+2), x)
end
and gen(i,_,nil) = i
| gen(i, _, (_,DEFINE lab) :: rest) = (lab := i; gen(i,NONE, rest))
(* invalidate the pc pointer at labels *)
(* may want to do special fiddling with NOPs *)
| gen(pos, pcptr, x as ((sizeref as ref size, inst) :: rest)) =
if (pcptr=NONE andalso needs_a_pcptr(size, inst)) then makepcptr(pos,x)
else if emit_now
then
«emit MIPS instructions»
else
«compute positions»
in gen
end
@