home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Usenet 1994 October
/
usenetsourcesnewsgroupsinfomagicoctober1994disk1.iso
/
altsrc
/
articles
/
11217
< prev
next >
Wrap
Internet Message Format
|
1994-09-05
|
3KB
Path: wupost!waikato!comp.vuw.ac.nz!actrix.gen.nz!hoult!Bruce
Newsgroups: comp.sys.powerpc,comp.arch.arithmetic,gnu.misc.discuss,comp.programming,sci.math,alt.sources
Subject: Re: ANSWER: algorithm to perform 64-bit / 32-bit signed & unsigned divide
Message-ID: <2861579804@hoult.actrix.gen.nz>
From: Bruce@hoult.actrix.gen.nz (Bruce Hoult)
Date: Mon, 5 Sep 1994 03:16:45 +1200 (NZST)
References: <usenet-0209941834550001@lowry.eche.ualberta.ca>
Lines: 44
Xref: wupost comp.sys.powerpc:28530 comp.arch.arithmetic:581 gnu.misc.discuss:18434 comp.programming:12169 sci.math:76349 alt.sources:11217
usenet@lowry.eche.ualberta.ca (Brian Lowry) writes:
> > dmult ;
> > MulLW R6,R3,R4 ; Generate low word of result
> > StW R6,4(R5) ; Save result
> > MulHWU R7,R3,R4 ; Generate high word of result
> > StW R7,0(R5) ; Save result
> > BLR ; Return to caller
>
> This code makes me curious about PPC assembly (I've read the manual and
> a book on it, but I guess the details don't stick until you write some
> code). I would have thought that there would be some way to get the 64bit
> value to an FP register and then write it out to memory, thus saving a
> very expensive extra memory write operation. Also, everything I know
> about pipelining suggests that reorganizing the code as
No. Unfortunately there is no way to get information between the integer and
FP registers without going through memory (cache).
Overall, this is a good tradeoff, because you don't really want to do it all
that often, and having the two units totally separate makes it simpler to
push the clock rate up, making the machine overall faster.
> MulLW R6,R3,R4 ; Generate low word of result
> MulHWU R7,R3,R4 ; Generate high word of result
> StW R6,4(R5) ; Save result
> StW R7,0(R5) ; Save result
>
> would be more efficient. If anyone can spare the time to clue me in on
> the finer points of PPC assembly, I'd appreciate it (not that I mind
> programming in C/C++, but occasionally it seems somewhat inefficient...)
You'd think so, but unfortunately on the PPC601 the integer mul instructions
aren't pipelined and they stall the next instruction until they have
completed.
If a future PPC chip has either a) pipelined integer mul, or b) two or more
integer ALUs capable of doing muls at the same time, then you would be
correct.
I'd tend to write it your way anyway, since even on the PPC601 it won't be
worse than the original, and it might one day be better.
-- Bruce