home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.arch
- Path: sparky!uunet!think.com!spool.mu.edu!umn.edu!news.orst.edu!pmontgom
- From: pmontgom@math.orst.edu (Peter Montgomery)
- Subject: Re: register + register addressing
- Message-ID: <BxIFC5.BEr@news.orst.edu>
- Sender: usenet@news.orst.edu
- Nntp-Posting-Host: lab12.math.orst.edu
- Organization: Oregon State University Math Department
- References: <18938@ucdavis.ucdavis.edu> <endecotp.721329802@cs.man.ac.uk>
- Date: Tue, 10 Nov 1992 17:02:26 GMT
- Lines: 71
-
- In article <endecotp.721329802@cs.man.ac.uk> endecotp@cs.man.ac.uk
- (PB Endecott (PhD SFurber)) writes:
- >
- >Although it's true that both register+offset and register+register modes
- >require an addition, you haven't allowed for the fact that an extra
- >register read has to take place. Normally most microprocessors have two
- >read ports and one write port on the register file, which is exactly what
- >is required for three address arithmetic/logical operations. When you
- >execute a store instruction, one read port is used for the data value, and
- >the other for the address register.
- >
- ...
- >
- >Of course for a load, you do have two read ports available. Would anyone
- >consider an architecture with non-symetrical addressing modes, where loads
- >can do register+constant or register+register, but stores can do
- >register+constant only?
- >
- >Another feature that some processors have and others don't is
- >auto-indexing. During loads, this requires an extra write port (or an
- >extra cycle) to put the modified value back in the register; but during
- >stores the write port is not used for data. So how about an architecture
- >with autoindexing for stores but not for loads ?
-
- The loops where I have found register+register indexing most useful
- are those resembling
-
- for I from 0 to N do
- A(I) := B(I) + C(I)*D(I)
- end for
-
- All arrays have the same data type, say 4 bytes per item.
- You initialize four registers with the addresses of A, B, C, D
- and put 4*I into another. Only one increment (of 4*I) is needed
- per iteration, not four separate increments of &A(I), &B(I), &C(I),
- and &D(I). When this code is nested inside another loop,
- the registers containing the base addresses won't need
- re-initialization every time, since they are not modified.
-
- The drawback is less freedom in the scheduling
- of instructions, since the increment of 4*I and the compare
- of 4*I to 4*N must wait until after the main computation
- of B(I) + C(I)*D(I).
-
- If one is more careful, we can overcome this hurdle and
- use register+register indexing only on loads. The code would be
- (assuming array offsets start at 0):
-
- r1 = B-A+4
- r2 = C-A+4
- r3 = D-A+4
- r4 = A-4 (will be A+4*(I-1)I)
- r5 = A+4*N
-
- while (r4 < r5) do
- temp := load(r1+r4) + load(r2+r4)*load(r3+r4)
- increment r4, start compare against r5
- store(r4) := temp
- end while
-
- (auto-indexing on store could also be used here)
-
- Register+register addressing is inadequate if one wants
- to load a datum and also its neighbor. The MIPS R3000, for example,
- lacks double precision (8-byte) loads and stores. Rather, the
- upper half of the datum and its lower half are loaded separately.
- If one half is loaded with register+register, the other half
- will need register+register+offset.
- --
- Peter L. Montgomery Internet: pmontgom@math.orst.edu
- Dept. of Mathematics, Oregon State Univ, Corvallis, OR 97331-4605 USA
-