There seems to be a problem with how code is generated for a call after a loop is unrolled. Consider this C code (on a PowerBook G4 machine): typedef unsigned long int mp_limb_t; typedef struct { int _mp_alloc; int _mp_size; mp_limb_t *_mp_d; } __mpz_struct; typedef __mpz_struct mpz_t[1]; typedef const __mpz_struct *mpz_srcptr; typedef __mpz_struct *mpz_ptr; void Foo(mpz_srcptr base) { unsigned i; mpz_t want; __gmpz_init(want); for (i = 0; i < 2; i++) { __gmpz_mul(want, want, base); } __gmpz_clear(want); } void Bar(mpz_srcptr base) { mpz_t want; __gmpz_init(want); __gmpz_mul(want, want, base); __gmpz_mul(want, want, base); __gmpz_clear(want); } If you look at "testcase.llvm.s", you'll notice that, even though both Foo and Bar functions compute the exact same thing, the Foo function, because it's a loop that's been unrolled, has an "implicit def" of R3 before the gmpz_clear call: mr r3, r29 mr r4, r29 mr r5, r30 bl L___gmpz_mul$stub ;IMPLICIT_DEF_GPRC r3 bl L___gmpz_clear$stub while the Bar case has: mr r3, r29 mr r4, r29 mr r5, r30 bl L___gmpz_mul$stub mr r3, r29 bl L___gmpz_clear$stub The Foo case would be fine, if R3 wasn't trashed in the calls, but it appears to be. Though my guess is that it probably should be marked as "clobbered" across calls. The "mpz_clear" function's in a library and looks like this: void mpz_clear (mpz_ptr m) { (*__gmp_free_func) (m->_mp_d, m->_mp_alloc * BYTES_PER_MP_LIMB); } Now, gcc produces code for Foo that looks like this: L2: addi r3,r1,56 mr r5,r29 mr r4,r3 bl L___gmpz_mul$stub addic. r30,r30,-1 bne- cr0,L2 addi r3,r1,56 bl L___gmpz_clear$stub and Bar that looks like this: addi r3,r1,56 mr r5,r29 mr r4,r3 bl L___gmpz_mul$stub addi r3,r1,56 bl L___gmpz_clear$stub So it's doing the correct thing. We should too. BTW, this is the only test which failed in the GMP testsuite. Woo! -bw
Created attachment 824 [details] The C code (reduced from GMP's t-pow.c file) that produces the Bus Error.
Created attachment 825 [details] LLVM's assembly output
Created attachment 826 [details] GCC's assembly output
Bill, thanks for tracking this! This test fails for me (x86/linux) too. Probably due to same reason.
No prob :-) It's pretty onerous. It looks like we *are* marking the R3 register as being clobbered. I now think it's some strange loop weirdness. I'm not skilled enough with bugpoint to get it to widdle this down to the pass that could be causing the problem. -bw
Note, that I'm on x86 :) So, this seems to be some common codegen weirdness.
This is a bug in loop unroll rewriting LCSSA phi nodes.
Fixed, patch here: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070430/049154.html testcase here: Transforms/LoopUnroll/2007-05-05-UnrollMiscomp.ll -Chris
Quicker next time! ;-) Thanks! -bw