ARM processors have a user mode and a number of privileged supervisor modes.
These are used as follows:
<DL>
<DT> IRQ
<DD>Entered when an Interrupt Request (IRQ) is triggered.
<DT> FIQ
<DD>Entered when a Fast Interrupt Request (FIQ) is triggered.
<DT> SVC
<DD>Entered when a Software Interrupt (SWI) is executed.
<DT> Undef
<DD>Entered when an Undefined instruction is executed (Not ARM 2 and
3, where SVC mode is entered).
<DT> Abt
<DD>Entered when a memory access attempt is aborted by the memory manager
(e.g. MEMC or MMU), usually because an attempt is made to access
non-existent memory or to access memory from an insufficiently privileged
mode (Not ARM 2 and 3, where SVC mode is entered).
</DL>
<P>
In each case the appropriate hardware vector is also called.
<P>
<HR><A NAME="Registers"><H2>
Registers
</H2></A>
<P>
The ARM 2 and 3 have 27 32 bit processor registers, 16 of which are visible
at any given time (which sixteen varies according to the processor mode).
These are referred to as R0-R15.
<P>
The ARM 6 and later have 31 32 bit processor registers, again 16 of which
are visible at any given time.
<P>
R15 has special significance. On the ARM 2 and 3, 24 bits are used as the
program counter, and the remaining 8 bits are used to hold processor mode,
status flags and interrupt modes. R15 is therefore often referred to as PC.
<PRE>
R15 = PC = NZCVIFpp pppppppp pppppppp ppppMM
</PRE>
Bits 0-1 and 26-31 are known as the PSR (processor status
register). Bits 2-25 give the address (in words) of the instruction
currently being fetched into the execution pipeline (see below). Thus
instructions are only ever executed from word aligned addresses.
<! center BOXED ;
l VLINE l. ><PRE>
M Current processor mode
0 User Mode
1 Fast interrupt processing mode (FIQ mode)
2 Interrupt processing mode (IRQ mode)
3 Supervisor mode (SVC mode)
</PRE>
<! center BOXED ;
l l. ><PRE>
Name Meaning
N Negative flag
Z Zero flag
C Carry flag
V oVerflow flag
I Interrupt request disable
F Fast interrupt request disable
</PRE>
<P>
R14, R14_FIQ, R14_IRQ, and R14_SVC are sometimes known as `link'
registers due to their behaviour during the branch with link
instructions.
<P>
The ARM 6 and later processor cores support a 32 bit address space. Such
processors can operate in both 26 bit and 32 bit PC modes. In 26 bit PC
mode, R15 acts as on previous processors, and hence code can only be run in
the lowest 64MBytes of the address space. In 32 bit PC mode, all 32 bits of
R15 are used as the program counter. Separate status registers are used to
store the processor mode and status flags. These are defined as follows:
<PRE>
NZCVxxxx xxxxxxxx xxxxxxxx IFxMMMMM
</PRE>
Note that the bottom two bits of R15 are always zero in 32-bit modes -
i.e. you can still only get word-aligned instructions. Any attempts to
write non-zeros to these bits will be ignored.
<P>
The following modes are currently defined:
<! center BOXED ;
l VLINE c VLINE l. ><PRE>
M Name Meaning
00000 usr_26 26 bit PC User Mode
00001 fiq_26 26 bit PC FIQ Mode
00010 irq_26 26 bit PC IRQ Mode
00011 svc_26 26 bit PC SVC Mode
10000 usr_32 32 bit PC User Mode
10001 fiq_32 32 bit PC FIQ Mode
10010 irq_32 32 bit PC IRQ Mode
10011 svc_32 32 bit PC SVC Mode
10111 abt_32 32 bit PC Abt Mode
11011 und_32 32 bit PC Und Mode
</PRE>
<P>
Extrapolating from the above table, it might be expected that the
following two modes are also defined:
<! center BOXED ;
l VLINE c VLINE l. ><PRE>
M Name Meaning
00111 abt_26 26 bit PC Abt Mode
01011 und_26 26 bit PC Und Mode
</PRE>
These are in fact undefined (and if you <B>do</B> write 00111 or 01011 to
the mode bits, the resulting chip state won't be what you might expect
- i.e. it won't be a 26-bit privileged mode with the appropriate R13
and R14 swapped in).
<P>
The following table shows which registers are available in which
processor modes:
<PRE>
+------+---------------------------------------+
| Mode | Registers available |
+------+---------------------------------------+
| USR | R0 - R14 R15 |
+------+---------+-----------------------------+
| FIQ | R0 - R7 | R8_FIQ - R14_FIQ R15 |
+------+---------+----+------------------------+
| IRQ | R0 - R12 | R13_IRQ - R14_IRQ R15 |
+------+--------------+------------------------+
| SVC | R0 - R12 | R13_SVC - R14_SVC R15 |
+------+--------------+------------------------+
| ABT | R0 - R12 | R13_ABT - R14_ABT R15 | (ARM 6 and later only)
+------+--------------+------------------------+
| UND | R0 - R12 | R13_UND - R14_UND R15 | (ARM 6 and later only)
+------+---------------------------------------+
</PRE>
<P>
There are six status registers on the ARM6 and later processors. One is
the current processor status register (CPSR) and holds information about
the current state of the processor. The other five are the saved processor
status registers (SPSRs): there is one of these for each privileged mode,
to hold information about the state the processor must be returned to when
exception handling in that mode is complete.
<P>
These registers are set and read using the MSR and MRS instructions
respectively.
<P>
<HR><A NAME="Pipeline"><H2>
Pipeline
</H2></A>
<P>
Rather than being a microcoded processor, the ARM is (in keeping with
its RISCness) entirely hardwired.
<P>
To speed execution the ARM 2 and 3 have 3 stage pipelines. The first
stage holds the instruction being fetched from memory. The second
starts the decoding, and the third is where it is actually
executed. Due to this, the program counter is always 2 instructions
beyond the currently executing instruction. (This must be taken
account of when calculating offsets for branch instructions).
<P>
Because of this pipeline, 2 instruction cycles are lost on a branch
(as the pipeline must refill). It is therefore often preferable to
make use of conditional instructions to avoid wasting cycles. For
example:
<PRE>
...
CMP R0,#0
BEQ over
MOV R1,#1
MOV R2,#2
over
...
</PRE>
can be more efficiently written as:
<PRE>
...
CMP R0,#0
MOVNE R1,#1
MOVNE R2,#2
...
</PRE>
<P>
<HR><A NAME="Timings"><H2>
Timings
</H2></A>
<P>
ARM instructions are timed in a mixture of S, N, I and C cycles.
<P>
An S-cycle is a cycle in which the ARM accesses a sequential memory
location.
<P>
An N-cycle is a cycle in which the ARM accesses a non-sequential memory
location.
<P>
An I-cycle is a cycle in which the ARM doesn't try to access a memory
location or to transfer a word to or from a coprocessor.
<P>
A C-cycle is a cycle in which a word is transferred between the ARM and a
coprocessor on either the data bus (for uncached ARMs) or the coprocessor
bus (for cached ARMs).
<P>
The different types of cycle must all be at least as long as the ARM's
clock rating. The memory system can stretch them: with a typical DRAM
system, this results in:
<UL>
<LI>
N-cycles being twice the minimum length (essentially because
DRAMs require a longer access protocol when the memory access
is non-sequential).
<LI>
S-cycles usually being the minimum length, but occasionally
being stretched to N-cycle length (when you've just moved
sequentially from the last word of one memory "row" to the
first of the next one<A HREF="#Footnote1">[1]</A>).
<LI>
I- and C-cycles always being the minimum length.
</UL>
<P>
With a typical SRAM system, all four types of cycle are typically the
minimum length.
<P>
On the 8MHz ARM2 used in the Acorn Archimedes A440/1, an S
(sequential) cycle is 125ns and an N (non-sequential) cycle is
250ns. It should be noted that these timings are <B>not</B> attributes of
the ARM, but of the memory system. E.g. an 8MHz ARM2 can be connected
to a static RAM system which gives a 125ns N cycle. The fact that the
processor is rated at 8MHz simply means that it isn't guaranteed to
work if you make any of the types of cycle shorter than 125ns in
length.
<P>
Cached processors: All the information given is in terms of the clock
cycles seen by the ARM. These do not occur at a constant rate: the
cache control logic changes the source of the clock cycles presented
to the ARM when cache misses occur.
<P>
Generally, a cached ARM has two clock inputs: the "fast clock" FCLK
and the "memory clock" MCLK. When operating normally from cache, the
ARM is clocked at FCLK speed and all types of cycle are the minimum
length: cache is effectively a type of SRAM from this point of
view. When a cache miss occurs, the ARM's clock is synchronised to
MCLK, then the cache line fill takes place at MCLK speed (taking
either N+3S or N+7S depending on the length of cache lines in the
processor involved), then the ARM's clock is resynchronised back to
FCLK.
<P>
While the memory access is taking place, the ARM is being clocked:
however, an input called NWAIT is used to cause the ARM cycles
involved not to do anything until the correct word arrives from
memory, and usually not to do anything while the remaining words
arrive (to avoid getting further memory requests while the cache is
still busy with the cache line refill). The situation is also
complicated by the fact that the cached ARM can be configured either
for FCLK and MCLK to be synchronous to each other (so FCLK is an exact
multiple of MCLK, and every MCLK clock cycle starts at just about the
same time as an FCLK cycle) or asynchronous (in which case FCLK and
MCLK cycles can have any relationship to each other).
<P>
All in all, the situation is therefore quite complicated. An approximation
to the behaviour is that when a cache line miss occurs, the cycle involved
takes the cache line refill time (i.e. N+3S or N+7S) in MCLK cycles, with
N-cycles and S-cycles probably being stretched as described above for DRAM,
plus a few more cycles to allow for the resynchronisation periods. For any
more details, you really need to get a datasheet for the processor involved.
<P>
<A NAME="Footnote1">Footnote 1:</A> Memory controllers tend to use this simple strategy: if an N-cycle is requested, treat the access as not being in the same row; if an S-cycle is requested, treat the access as being in the same row unless it is effectively the last word in the row (which can be detected quickly). The net result is that <B>some</B> S-cycles will last the same time as an N-cycle; if I remember correctly, on an Archimedes these are S-cycle accesses to an address which is divisible by 16. The practical consequences of this for Archimedes code are: (a) that about 1 in 4 S-cycles becomes an N-cycle, since for this purpose, all addresses are word addresses and so divisible by 4; (b) that it is occasionally worth taking care to align code carefully to avoid this effect and get some extra performance.)
<P>
<HR><A NAME="Instructions"><H2>
Instructions
</H2></A>
<P>
Each ARM instruction is 32 bits wide, and are explained in more detail
below. For each instruction class we give the instruction bitmap, and
an example of the syntax used by a typical assembler.
<P>
It should of course be noted that the mnemonic syntax is not fixed; it
is a property of the assembler, not the ARM machine code.
<P>
<A NAME="Condition"><H3>
Condition Code
</H3></A>
<P>
The top nibble of every instruction is a condition code, so every