home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: gnu.gcc.help
- Path: sparky!uunet!munnari.oz.au!metro!allan
- From: allan@maths.su.oz.au (Allan Steel)
- Subject: Code generation question
- Message-ID: <allan.711853075@dumas>
- Summary: Why doesn't gcc unroll loops properly?
- Keywords: gcc,optimization,loop-unrolling
- Sender: allan@maths.su.oz.au
- Nntp-Posting-Host: dumas.maths.su.oz.au
- Organization: Sydney University Computing Service, Sydney, NSW, Australia
- Date: Thu, 23 Jul 1992 00:57:55 GMT
- Lines: 136
-
-
- I have some code like the following which I am using on a Sun Sparc:
-
- #define SIZE 100000
-
- int inner(xp, yp, n)
- int *xp, *yp, n;
- {
- int *ep, v;
-
- v = 0;
- ep = xp + n;
-
- while (xp < ep)
- v ^= *xp++ & *yp++;
-
- return v;
- }
-
- main()
- {
- int i, x[SIZE], y[SIZE], s;
-
- for (i = 0; i < SIZE; i++)
- {
- x[i] = rand();
- y[i] = rand();
- }
-
- s = 0;
- for (i = 0; i < 200; i++)
- s += inner(x, y, SIZE);
-
- printf("%d\n", s);
- }
-
- The code produced by gcc seems to be slower than that produced by
- the Sun compiler /bin/cc (on my machine the Sun code takes an average
- of about 6.9 seconds compared to 8.3 for gcc).
-
- The assembly code produced for the function inner() is as follows:
-
- gcc -O2 -funroll-loops:
-
- _inner:
- !#PROLOGUE# 0
- !#PROLOGUE# 1
- sll %o2,2,%o2
- add %o0,%o2,%o2
- cmp %o0,%o2
- bgeu L3
- mov 0,%o3
- L4:
- ld [%o1],%g3
- ld [%o0],%g2
- add %o1,4,%o1
- add %o0,4,%o0
- cmp %o0,%o2
- and %g2,%g3,%g2
- blu L4
- xor %o3,%g2,%o3
- L3:
- retl
- mov %o3,%o0
-
-
- cc -O4:
-
- _inner:
- !#PROLOGUE# 0
- !#PROLOGUE# 1
- save %sp,-64,%sp
- sll %i2,2,%i2
- add %i0,%i2,%i2
- cmp %i0,%i2
- bcc L77006
- mov 0,%i5
- add %i0,12,%o2
- cmp %o2,%i2
- bcc,a LY2
- ld [%i0],%i4
- L77003:
- ld [%i0],%o3
- ld [%i0+4],%o7
- ld [%i0+8],%l2
- ld [%i0+12],%l5
- ld [%i1],%o4
- ld [%i1+4],%l0
- ld [%i1+8],%l3
- ld [%i1+12],%l6
- and %o3,%o4,%o3
- xor %i5,%o3,%i5
- and %o7,%l0,%o7
- inc 16,%i0
- add %i0,12,%i3
- xor %i5,%o7,%i5
- and %l2,%l3,%l2
- xor %i5,%l2,%i5
- cmp %i3,%i2
- and %l5,%l6,%l5
- inc 16,%i1
- blu L77003
- xor %i5,%l5,%i5
- cmp %i0,%i2
- bcc L77006
- nop
- L77010:
- ld [%i0],%i4
- LY2: ! [internal]
- ld [%i1],%o0
- inc 4,%i0
- cmp %i0,%i2
- and %i4,%o0,%i4
- inc 4,%i1
- bcs L77010
- xor %i5,%i4,%i5
- L77006:
- ret
- restore %g0,%i5,%o0
-
-
- It seems that the Sun compiler unfolds the loop much better by
- handling 4 ints at a time instead of just 1 which gcc does.
-
- I have -O2 and -funroll-loops turned on - I believe they give
- the most optimization and yet gcc is not unrolling the loop
- like the Sun compiler does!
-
- Does anyone have any ideas why this is so?
-
- Allan
- --
- +------------------------------------------------------------+
- | Allan Steel allan@maths.su.oz.au |
- | School of Mathematics and Statistics, University of Sydney |
- +------------------------------------------------------------+
-