home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.sys.super:1205 comp.arch:12396 comp.compilers:2260
- Newsgroups: comp.sys.super,comp.arch,comp.compilers
- Path: sparky!uunet!world!iecc!compilers-sender
- From: pmontgom@math.orst.edu (Peter Montgomery)
- Subject: Re: How many vector registers are useful?
- Reply-To: pmontgom@math.orst.edu (Peter Montgomery)
- Organization: Oregon State University Math Department
- Date: Tue, 26 Jan 1993 03:01:16 GMT
- Approved: compilers@iecc.cambridge.ma.us
- Message-ID: <93-01-188@comp.compilers>
- Keywords: architecture, question
- References: <93-01-174@comp.compilers>
- Sender: compilers-sender@iecc.cambridge.ma.us
- Lines: 65
-
- kirchner@uklira.informatik.uni-kl.de (Reinhard Kirchner) writes:
- >On discussing various merrits of different vector machines we came about
- >the issue of the register architectures. There are on one side the cray and
- >convex with 8 vector registers a 64 or 128 words, and on the other side,
- >
- >The Fujitsu machines with their reconfigurable register file of 32 or
- >64kb, which is 4k or 8k words, being grouped from 256 register a 16/32
- >words to 8 registers a 512/1024 words.
- >
- >Now there is the question: is such a large register file useful at all ?
-
- I used an Alliant FX/80 while at UCLA. It had eight vector
- registers each length 32, which could hold integer or floating point
- operands.
-
- One time critical routine in my program was multiple precision
- modular multiplication. The assembly language loop which multiplied one
- vector of length <= 32 by another such vector had enough vector registers,
- but there were insufficient vector registers for another loop which
- multiplied two vectors of length <= 64. These loops also faced a shortage
- of scalar integer registers (Motorola 68020 has 8 address and 8 data
- registers), requiring me to use a floating point register for one loop
- control variables. I guess that 16 or 32 vector registers will be
- adequate for most applications.
-
- >But how is this on vector machines ? The register creates a speedup only
- >when it can hold an entire vector, which can be used again later. This
- >requires a register long enough to do so. That means vectors of e.g. a
- >length of 5000 can not be held anyway, every machine must load, process,
- >and store it in pieces, and only a lot of memory bandwidth helps.
-
- It is important to strip mine and re-use vectors. Consider
- evaluating a polynomial at 5000 points:
-
- do i = 1, 5000
- pvalue = p(degree) ! Leading coefficient
- do j = degree-1, 0, -1
- pvalue = pvalue*x(i) + p(j) ! Horner's rule
- end do
- value(i) = pvalue
- end do
-
- On a machine with vector length at most 64, the code can be
-
- do ibeg = 1, 5000, 64
- iend = MIN(i + 63, 5000)
- lng = iend - ibeg + 1
- pvalue(1:lng) = p(degree)
- do j = degree-1, 0, -1
- pvalue(1:lng) = pvalue(1:lng)*x(ibeg:iend) + p(j)
- end do
- value(ibeg:iend) = pvalue(1:lng)
- end do
-
- If pvalue(1:lng) and x(ibeg:iend) are assigned to vector registers across
- the j loop, then the only memory reference in that loop is the load of
- p(j). Loops like this (where I operate several times on one temporary
- vector, here pvalue) occurred in many parts of my cose. Alas, the
- compiler installed at UCLA did not perform these optimizations.
- --
- Peter L. Montgomery Internet: pmontgom@math.orst.edu
- Dept. of Mathematics, Oregon State Univ, Corvallis, OR 97331-4605 USA
- --
- Send compilers articles to compilers@iecc.cambridge.ma.us or
- {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
-