Well this one is weird since fn2 isn't even called, but it can't be removed or the bad behavior goes away. [regehr@imp r104]$ clang -w -O0 small.c ; ./a.out ffffffff [regehr@imp r104]$ clang -w -O small.c ; ./a.out ff [regehr@imp r104]$ gcc -w -O small.c ; ./a.out ffffffff [regehr@imp r104]$ cat small.c int printf(const char *, ...); int a, c, d, e, f, g, i, j; short b; char h; int fn1(void) { return b && a > b || b < 0 && a ?: b; } void fn2(void) { for (;; f = fn1()) ; } void fn3(int p1) { j &&(c = 0 ?: 0); g = p1; } int main(void) { for (h = 0; h >= 0; ++h) { fn3(h); i = d >= 0 ?: 0; } e = g; e += h; printf("%x\n", e); return 0; } [regehr@imp r104]$ clang -v clang version 3.4 (trunk 191880) Target: x86_64-unknown-linux-gnu Thread model: posix Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.4 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.4.6 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.4.7 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.5 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.5.3 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6.3 Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6
(In reply to comment #0) > Well this one is weird since fn2 isn't even called, but it can't be removed > or the bad behavior goes away. John, thanks for cc'ing me on this report. I suspect that the code is invalid because as far as I know the standard doesn't like code like "a ?: b", where the middle term is missing.
(In reply to comment #1) > (In reply to comment #0) > > Well this one is weird since fn2 isn't even called, but it can't be removed > > or the bad behavior goes away. > > John, thanks for cc'ing me on this report. I suspect that the code is > invalid because as far as I know the standard doesn't like code like "a ?: > b", where the middle term is missing. The missing argument to the ternary operator is a GCC idiom that I believe Clang attempts to implement faithfully. x?:z simply means x?x:z
(In reply to comment #2) > (In reply to comment #1) > > (In reply to comment #0) > > > Well this one is weird since fn2 isn't even called, but it can't be removed > > > or the bad behavior goes away. > > > > John, thanks for cc'ing me on this report. I suspect that the code is > > invalid because as far as I know the standard doesn't like code like "a ?: > > b", where the middle term is missing. > > The missing argument to the ternary operator is a GCC idiom that I believe > Clang attempts to implement faithfully. > > x?:z simply means x?x:z I see; thanks John. I was able to reduce it a bit more and got rid of fn2 (see below). -------------------------- int printf (const char *, ...); int a, c, d, e, g, i, j; short b; char h; int fn1 (void) { return a || (b && (a ? a : b)); } void fn3 (int p1) { j && (c = 0 ? 0 : 0); g = p1; } int main (void) { for (h = 0; h >= 0; ++h) { fn3 (h); i = d >= 0 ? d >= 0 : 0; } e = g; e += h; printf ("%x\n", e); return 0; }
(In reply to comment #3) > (In reply to comment #2) > > (In reply to comment #1) > > > (In reply to comment #0) > > > > Well this one is weird since fn2 isn't even called, but it can't be removed > > > > or the bad behavior goes away. > > > > > > John, thanks for cc'ing me on this report. I suspect that the code is > > > invalid because as far as I know the standard doesn't like code like "a ?: > > > b", where the middle term is missing. > > > > The missing argument to the ternary operator is a GCC idiom that I believe > > Clang attempts to implement faithfully. > > > > x?:z simply means x?x:z > > I see; thanks John. > > I was able to reduce it a bit more and got rid of fn2 (see below). > > -------------------------- > > int printf (const char *, ...); > int a, c, d, e, g, i, j; > short b; > char h; > > int > fn1 (void) > { > return a || (b && (a ? a : b)); > } > > void > fn3 (int p1) > { > j && (c = 0 ? 0 : 0); > g = p1; > } > > int > main (void) > { > for (h = 0; h >= 0; ++h) > { > fn3 (h); > i = d >= 0 ? d >= 0 : 0; > } > e = g; > e += h; > printf ("%x\n", e); > return 0; > } Oh, fn1 should be removed too: int printf (const char *, ...); int a, c, d, e, g, i, j; short b; char h; void fn3 (int p1) { j && (c = 0 ? 0 : 0); g = p1; } int main (void) { for (h = 0; h >= 0; ++h) { fn3 (h); i = d >= 0 ? d >= 0 : 0; } e = g; e += h; printf ("%x\n", e); return 0; }
> Oh, fn1 should be removed too: Thanks! I was lazy and figured that if C-Reduce couldn't get rid of those functions, I couldn't either.
Here's a different, smaller test case that looks similar enough that I'll put it here instead of making a new PR. [regehr@imp r105]$ cat small.c int printf(const char *, ...); int a, b, c; int fn1(int p1) { --c; return 0; } int main() { b = 0; for (; b > -24; b--) { fn1(a && fn1(0)); } --c; printf("%d\n", c); return 0; } [regehr@imp r105]$ gcc -O small.c ; ./a.out -25 [regehr@imp r105]$ clang -O0 small.c ; ./a.out -25 [regehr@imp r105]$ clang -O1 small.c ; ./a.out -25 [regehr@imp r105]$ clang -O2 small.c ; ./a.out -22
(In reply to comment #6) > Here's a different, smaller test case that looks similar enough that I'll > put it here instead of making a new PR. > > [regehr@imp r105]$ cat small.c > int printf(const char *, ...); > int a, b, c; > int fn1(int p1) { > --c; > return 0; > } > > int main() { > b = 0; > for (; b > -24; b--) { > fn1(a && fn1(0)); > } > --c; > printf("%d\n", c); > return 0; > } > [regehr@imp r105]$ gcc -O small.c ; ./a.out > -25 > [regehr@imp r105]$ clang -O0 small.c ; ./a.out > -25 > [regehr@imp r105]$ clang -O1 small.c ; ./a.out > -25 > [regehr@imp r105]$ clang -O2 small.c ; ./a.out > -22 It should be different as only the trunk miscompiles it at -O2, while all versions of clang miscompile the initial testcase.
(In reply to comment #6) > Here's a different, smaller test case that looks similar enough that I'll > put it here instead of making a new PR. > > [regehr@imp r105]$ cat small.c > int printf(const char *, ...); > int a, b, c; > int fn1(int p1) { > --c; > return 0; > } > > int main() { > b = 0; > for (; b > -24; b--) { > fn1(a && fn1(0)); > } > --c; > printf("%d\n", c); > return 0; > } > [regehr@imp r105]$ gcc -O small.c ; ./a.out > -25 > [regehr@imp r105]$ clang -O0 small.c ; ./a.out > -25 > [regehr@imp r105]$ clang -O1 small.c ; ./a.out > -25 > [regehr@imp r105]$ clang -O2 small.c ; ./a.out > -22 Here is another testcase that should point to the same issue as the above testcase. Since both miscompilations go away with -fno-vectorize, this should be a vectorizer bug. ------------------------ int printf (const char *, ...); int a, b, c, d; int main () { for (a = 0; a < 3; a++) { c++; for (d = 0; d < 22; d++) { if (b) c++; c++; } } printf ("%d\n", c); return 0; }
The first three snippets are not related to the vectorizers: Compiling them with: > Debug+Asserts/bin/clang -O2 bug17473_4.c -emit-llvm -S shows no vectorized code. And compiling them with the vectorizers off shows the same failure behavior: > Debug+Asserts/bin/clang -O2 bug17473_3.c -fno-vectorize -fno-slp-vectorize > ./a.out ff The last two examples are indeed a bug in the loop vectorizer. I will commit a fix for them soon. Basically, we were taking a reduction like this: for.body3.1: ; preds = %for.body3.1, %for.inc7 %inc613.1 = phi i32 [ 0, %for.inc7 ], [ %inc6.1, %for.body3.1 ] %inc511.1 = phi i32 [ %inc.1, %for.inc7 ], [ %inc5.1, %for.body3.1 ] %2 = zext i1 %tobool to i32 %inc4.1 = xor i32 %2, 1 %inc511.1.inc4.1 = add nsw i32 %inc511.1, %inc4.1 %inc5.1 = add nsw i32 %inc511.1.inc4.1, 1 %inc6.1 = add nsw i32 %inc613.1, 1 %exitcond.1 = icmp eq i32 %inc6.1, 22 br i1 %exitcond.1, label %for.inc7.1, label %for.body3.1 for.inc7.1: ; preds = %for.body3.1 %inc.2 = add nsw i32 %inc511.1.inc4.1, 2 br label %for.body3.2 And where generating this. The problem is that the external user was not using the “last” value in the reduction. So we would loose 3 "(add 1)”s. vector.body23: ; preds = %vector.body23, %for.inc7 %index26 = phi i32 [ 0, %for.inc7 ], [ %index.next30, %vector.body23 ] %vec.phi34 = phi <4 x i32> [ %7, %for.inc7 ], [ %11, %vector.body23 ] %8 = zext <4 x i1> %broadcast.splat36 to <4 x i32> %9 = xor <4 x i32> %8, <i32 1, i32 1, i32 1, i32 1> %10 = add nsw <4 x i32> %vec.phi34, %9 %11 = add nsw <4 x i32> %10, <i32 1, i32 1, i32 1, i32 1> %index.next30 = add i32 %index26, 4 %12 = icmp eq i32 %index.next30, 20 br i1 %12, label %middle.block24, label %vector.body23 middle.block24: ; preds = %vector.body23 %rdx.shuf38 = shufflevector <4 x i32> %10, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef> I have filed PR 17498 to track the vectorizer bug.
Hi. What's the status of this bug?
(In reply to comment #10) > Hi. What's the status of this bug? The test in comment 4 still reproduces: int printf(const char *, ...); int a, c, d, e, g, i, j; short b; char h; void fn3(int p1) { j &&(c = 0 ? 0 : 0); g = p1; } int main(void) { for (h = 0; h >= 0; ++h) { fn3(h); i = d >= 0 ? d >= 0 : 0; } e = g; e += h; printf("%x\n", e); return 0; } It prints different values at -O2 and -O0.
Looks like a bug in gvn /build/bin/clang test.c -o test.ll -m64 -emit-llvm -S -O0 ./build/bin/opt -S test.ll -o test2.ll -instcombine -inline -loop-rotate ./build/bin/opt -S test2.ll -o test3.ll -basicaa -gvn ./build/bin/llc test2.ll -o test2.s ./build/bin/llc test3.ll -o test3.s ./build/bin/clang test2.s -o test2 -m64 ./build/bin/clang test3.s -o test3 -m64 ./test2 ./test3 prints ffffffff ff
Created attachment 11530 [details] input to gvn Looks like ./build/bin/opt -S test2.ll -o test3.ll -basicaa -gvn misoptimizes this.
(In reply to comment #13) > Created attachment 11530 [details] > input to gvn > > Looks like > ./build/bin/opt -S test2.ll -o test3.ll -basicaa -gvn > > misoptimizes this. Actually, I think it's an LSR bug. Here's what I get: [morbo:llvm] opt -S -basicaa -gvn test2.ll -o - | llc -o test2.s -relocation-model pic -disable-fp-elim && clang test2.s -o test2 && ./test2 ff [morbo:llvm] opt -S -basicaa -gvn test2.ll -o - | llc -disable-lsr -o test2.s -relocation-model pic -disable-fp-elim && clang test2.s -o test2 && ./test2 ffffffff [morbo:llvm] opt -S -O2 test2.ll -o - | llc -disable-lsr -o test2.s -relocation-model pic -disable-fp-elim && clang test2.s -o test2 && ./test2 ffffffff
<rdar://problem/15464466> PR17473: LSR miscompile
As far as I understand, the test exploits undefined behaviour (signed overflow), which can't be considered as a compiler bug.
Michael, UBSAN doesn't think there's a signed overflow, can you explain why you think there is one?
Hi John, You could add a printf into the loop and it would become apparent: int printf(const char *, ...); int a, c, d, e, g, i, j; short b; char h; void fn3(int p1) { j &&(c = 0 ? 0 : 0); g = p1; } int main(void) { for (h = 0; h >= 0; ++h) { fn3(h); i = d >= 0 ? d >= 0 : 0; printf("h=%x\n", h); // <<< This one } e = g; e += h; printf("%x\n", e); return 0; } The output would be: h=0 h=1 .. h=7e h=7f ffffffff Also one could figure that out from the loop condition: (h = 0; h >= 0; h++). Var h starts from 0 and increases until it becomes negative - that inevitably leads to overflow. As far as I understand, there should be an undefined behaviour.
Hi Michael, most people believe that ++h cannot overflow when h is a char because it is promoted to int, incremented, and then converted back to char. I believe this issue is not 100% clear in the standard, but my the view above is shared by the Clang developers, which explains why UBSAN does not complain here.
(In reply to comment #19) > Hi Michael, most people believe that ++h cannot overflow when h is a char > because it is promoted to int, incremented, and then converted back to char. > > I believe this issue is not 100% clear in the standard, but my the view > above is shared by the Clang developers, which explains why UBSAN does not > complain here. Yes, I agree with John. This also applies to signed short.
Good to know, thanks! But doesn't that mean that the loop should be infinite in such case? I.e. at O0 we also generate incorrect code.
> But doesn't that mean that the loop should be infinite in such case? I.e. at > O0 we also generate incorrect code. Sorry, that's incorrect. I mixed up signed and unsigned behaviour. I looked into LSR to figure out, why is this happening and here is what I found. When LSR replaces IV with a new one, it uses i32 instead of original i8. As far as I understand, it is the widest type used in computations involving the IV, and the reason why we can do this is that we later can simply ignore higher bits if we need a narrower type. That usually works fine, but in our case the ultimate use is also i32, so we don't add truncate before it, and that causes the fail. I.e. the correct conversion sequence here should be i32->i8->i32, but LSR currently loses the middle part and thus doesn't emit needed trunc/sext, which seems to be the rootcause.
Fixed in r203719.