NetNews Usenet Archive 1992 #26

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #26 / NN_1992_26.iso / spool / comp / arch / 10585 < prev next >

Wrap

Text File | 1992-11-10 | 3.4 KB | 84 lines

Newsgroups: comp.arch Path: sparky!uunet!think.com!spool.mu.edu!umn.edu!news.orst.edu!pmontgom From: pmontgom@math.orst.edu (Peter Montgomery) Subject: Re: register + register addressing Message-ID: <BxIFC5.BEr@news.orst.edu> Sender: usenet@news.orst.edu Nntp-Posting-Host: lab12.math.orst.edu Organization: Oregon State University Math Department References: <18938@ucdavis.ucdavis.edu> <endecotp.721329802@cs.man.ac.uk> Date: Tue, 10 Nov 1992 17:02:26 GMT Lines: 71 In article <endecotp.721329802@cs.man.ac.uk> endecotp@cs.man.ac.uk (PB Endecott (PhD SFurber)) writes: > >Although it's true that both register+offset and register+register modes >require an addition, you haven't allowed for the fact that an extra >register read has to take place. Normally most microprocessors have two >read ports and one write port on the register file, which is exactly what >is required for three address arithmetic/logical operations. When you >execute a store instruction, one read port is used for the data value, and >the other for the address register. > ... > >Of course for a load, you do have two read ports available. Would anyone >consider an architecture with non-symetrical addressing modes, where loads >can do register+constant or register+register, but stores can do >register+constant only? > >Another feature that some processors have and others don't is >auto-indexing. During loads, this requires an extra write port (or an >extra cycle) to put the modified value back in the register; but during >stores the write port is not used for data. So how about an architecture >with autoindexing for stores but not for loads ? The loops where I have found register+register indexing most useful are those resembling for I from 0 to N do A(I) := B(I) + C(I)*D(I) end for All arrays have the same data type, say 4 bytes per item. You initialize four registers with the addresses of A, B, C, D and put 4*I into another. Only one increment (of 4*I) is needed per iteration, not four separate increments of &A(I), &B(I), &C(I), and &D(I). When this code is nested inside another loop, the registers containing the base addresses won't need re-initialization every time, since they are not modified. The drawback is less freedom in the scheduling of instructions, since the increment of 4*I and the compare of 4*I to 4*N must wait until after the main computation of B(I) + C(I)*D(I). If one is more careful, we can overcome this hurdle and use register+register indexing only on loads. The code would be (assuming array offsets start at 0): r1 = B-A+4 r2 = C-A+4 r3 = D-A+4 r4 = A-4 (will be A+4*(I-1)I) r5 = A+4*N while (r4 < r5) do temp := load(r1+r4) + load(r2+r4)*load(r3+r4) increment r4, start compare against r5 store(r4) := temp end while (auto-indexing on store could also be used here) Register+register addressing is inadequate if one wants to load a datum and also its neighbor. The MIPS R3000, for example, lacks double precision (8-byte) loads and stores. Rather, the upper half of the datum and its lower half are loaded separately. If one half is loaded with register+register, the other half will need register+register+offset. -- Peter L. Montgomery Internet: pmontgom@math.orst.edu Dept. of Mathematics, Oregon State Univ, Corvallis, OR 97331-4605 USA