NetNews Usenet Archive 1992 #16

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #16 / NN_1992_16.iso / spool / gnu / gcc / help / 1764 < prev next >

Wrap

Text File | 1992-07-22 | 2.7 KB | 150 lines

Newsgroups: gnu.gcc.help Path: sparky!uunet!munnari.oz.au!metro!allan From: allan@maths.su.oz.au (Allan Steel) Subject: Code generation question Message-ID: <allan.711853075@dumas> Summary: Why doesn't gcc unroll loops properly? Keywords: gcc,optimization,loop-unrolling Sender: allan@maths.su.oz.au Nntp-Posting-Host: dumas.maths.su.oz.au Organization: Sydney University Computing Service, Sydney, NSW, Australia Date: Thu, 23 Jul 1992 00:57:55 GMT Lines: 136 I have some code like the following which I am using on a Sun Sparc: #define SIZE 100000 int inner(xp, yp, n) int *xp, *yp, n; { int *ep, v; v = 0; ep = xp + n; while (xp < ep) v ^= *xp++ & *yp++; return v; } main() { int i, x[SIZE], y[SIZE], s; for (i = 0; i < SIZE; i++) { x[i] = rand(); y[i] = rand(); } s = 0; for (i = 0; i < 200; i++) s += inner(x, y, SIZE); printf("%d\n", s); } The code produced by gcc seems to be slower than that produced by the Sun compiler /bin/cc (on my machine the Sun code takes an average of about 6.9 seconds compared to 8.3 for gcc). The assembly code produced for the function inner() is as follows: gcc -O2 -funroll-loops: _inner: !#PROLOGUE# 0 !#PROLOGUE# 1 sll %o2,2,%o2 add %o0,%o2,%o2 cmp %o0,%o2 bgeu L3 mov 0,%o3 L4: ld [%o1],%g3 ld [%o0],%g2 add %o1,4,%o1 add %o0,4,%o0 cmp %o0,%o2 and %g2,%g3,%g2 blu L4 xor %o3,%g2,%o3 L3: retl mov %o3,%o0 cc -O4: _inner: !#PROLOGUE# 0 !#PROLOGUE# 1 save %sp,-64,%sp sll %i2,2,%i2 add %i0,%i2,%i2 cmp %i0,%i2 bcc L77006 mov 0,%i5 add %i0,12,%o2 cmp %o2,%i2 bcc,a LY2 ld [%i0],%i4 L77003: ld [%i0],%o3 ld [%i0+4],%o7 ld [%i0+8],%l2 ld [%i0+12],%l5 ld [%i1],%o4 ld [%i1+4],%l0 ld [%i1+8],%l3 ld [%i1+12],%l6 and %o3,%o4,%o3 xor %i5,%o3,%i5 and %o7,%l0,%o7 inc 16,%i0 add %i0,12,%i3 xor %i5,%o7,%i5 and %l2,%l3,%l2 xor %i5,%l2,%i5 cmp %i3,%i2 and %l5,%l6,%l5 inc 16,%i1 blu L77003 xor %i5,%l5,%i5 cmp %i0,%i2 bcc L77006 nop L77010: ld [%i0],%i4 LY2: ! [internal] ld [%i1],%o0 inc 4,%i0 cmp %i0,%i2 and %i4,%o0,%i4 inc 4,%i1 bcs L77010 xor %i5,%i4,%i5 L77006: ret restore %g0,%i5,%o0 It seems that the Sun compiler unfolds the loop much better by handling 4 ints at a time instead of just 1 which gcc does. I have -O2 and -funroll-loops turned on - I believe they give the most optimization and yet gcc is not unrolling the loop like the Sun compiler does! Does anyone have any ideas why this is so? Allan -- +------------------------------------------------------------+ | Allan Steel allan@maths.su.oz.au | | School of Mathematics and Statistics, University of Sydney | +------------------------------------------------------------+