The following code is miscompiled by the current clang trunk, clang 3.2, and clang 3.3 on x86_64-linux-gnu at -O3 in 32-bit mode, resulting in a SIGFPE. $ clang-trunk -v clang version 3.4 (trunk 189735) $ clang-trunk -m32 -O2 small.c $ a.out $ clang-trunk -m32 -O3 small.c $ a.out Floating point exception (core dumped) $ clang-3.3 -m32 -O3 small.c $ a.out Floating point exception (core dumped) $ clang-3.2 -m32 -O3 small.c $ a.out Floating point exception (core dumped) $ clang-trunk -m64 -O3 small.c $ a.out $ ------------------------------------ int a, b, d, e, *f = &b, g, i; char c; int foo (int p1, long long p2) { return p2 == 0 ? 0 : p1 % p2; } void bar (int p) { lbl: if (*f) g = 0; int **l = &f; *l = 0; for (; i < 1; i++) { if (i) goto lbl; if (foo (2, d || p)) { int **m = &f; *m = &e; } } } int main () { bar (&c == (void *)&a); return 0; }
The SIGFPE is correct. See the comments: int a, b, d, e, *f = &b, g, i; char c; int main() { /* '&c == (void*)&a' is false */ bar (&c == (void *)&a); return 0; } void bar(int p) { /* 'p' is 0 */ lbl: if (*f) g = 0; int **l = &f; *l = 0; for (; i < 1; i++) { if (i) /* i == 1 */ goto lbl; /* 'd == 0' and 'p == 0' (see above) * Therefore, this call is 'foo(2, 0)' */ if (foo (2, d || p)) { int **m = &f; *m = &e; } /* i == 1 */ } } int foo(int p1, long long p2) { /* 'p1 == 2' and 'p2' == 0 * This implies 'p1 % p2' is UB. */ return p2 == 0 ? 0 : p1 % p2; } So, it looks like it's valid for it to have a SIGFPE. The other optimization levels don't convert to the same code (of course). It results in either the '%' never being called or possibly never emitted.
> int foo(int p1, long long p2) { > /* 'p1 == 2' and 'p2' == 0 > * This implies 'p1 % p2' is UB. > */ > return p2 == 0 ? 0 : p1 % p2; > } Bill, we'll need to be careful here. Since p2 == 0 holds, p1 % p2 isn't executed at all. This is exactly how we guard for division by zeros. The code is valid and doesn't have any UBs.
(In reply to comment #2) > Bill, we'll need to be careful here. Since p2 == 0 holds, p1 % p2 isn't > executed at all. This is exactly how we guard for division by zeros. > > The code is valid and doesn't have any UBs. You're right. My internal C messed up on the precedence of the operators. Sorry about that.
$ cat 17073_reduced.ll ; ModuleID = 'tmp3.ll' target datalayout = "e-m:o-p:32:32-f64:32:64-f80:128-n8:16:32-S128" target triple = "i386-apple-macosx10.9.0" @b = common global i32 0, align 4 @f = global i32* @b, align 4 @g = common global i32 0, align 4 @i = common global i32 0, align 4 @d = common global i32 0, align 4 @e = common global i32 0, align 4 @c = common global i8 0, align 1 @a = common global i32 0, align 4 define i32 @main() { entry: %f.promoted.i = load i32** @f, align 4, !tbaa !0 %0 = load i32* %f.promoted.i, align 4, !tbaa !4 %tobool.i = icmp eq i32 %0, 0 br i1 %tobool.i, label %if.end.i, label %if.then.i if.then.i: ; preds = %entry store i32 0, i32* @g, align 4, !tbaa !4 br label %if.end.i if.end.i: ; preds = %if.then.i, %entry %pr.i = load i32* @i, align 4, !tbaa !4 %cmp3.i = icmp slt i32 %pr.i, 1 br i1 %cmp3.i, label %for.body1, label %bar.exit for.body1: ; preds = %if.end.i %1 = load i32* @d, align 4, !tbaa !4 %tobool4 = icmp eq i32 %1, 0 br i1 %tobool4, label %for.body6, label %for.body3 for.body3: ; preds = %for.body1 br i1 icmp eq (i32* bitcast (i8* @c to i32*), i32* @a), label %if.end8, label %if.end9 if.end8: ; preds = %for.body3 store i32 1, i32* @i, align 4, !tbaa !4 br label %bar.exit if.end9: ; preds = %for.body3 store i32 1, i32* @i, align 4, !tbaa !4 br label %bar.exit for.body6: ; preds = %for.body1 store i32 1, i32* @i, align 4, !tbaa !4 br label %bar.exit bar.exit: ; preds = %for.body6, %if.end9, %if.end8, %if.end.i %storemerge33.i = phi i32* [ null, %if.end.i ], [ null, %for.body6 ], [ null, %if.end9 ], [ select (i1 icmp eq (i64 urem (i64 2, i64 zext (i1 icmp eq (i32* bitcast (i8* @c to i32*), i32* @a) to i64)), i64 0), i32* null, i32* @e), %if.end8 ] store i32* %storemerge33.i, i32** @f, align 4, !tbaa !0 ret i32 0 } !0 = metadata !{metadata !1, metadata !1, i64 0} !1 = metadata !{metadata !"int", metadata !2, i64 0} !2 = metadata !{metadata !"omnipotent char", metadata !3, i64 0} !3 = metadata !{metadata !"Simple C/C++ TBAA"} !4 = metadata !{metadata !5, metadata !5, i64 0} !5 = metadata !{metadata !"any pointer", metadata !2, i64 0}
The testcase in comment 4 fails in -simplifycfg: $ ./opt 17073_reduced.ll | ./llc | ./clang -m32 -x assembler - -o a.out -Wl,-no_pie; ./a.out $ ./opt -simplifycfg 17073_reduced.ll | ./llc | ./clang -m32 -x assembler - -o a.out -Wl,-no_pie; ./a.out Floating point exception: 8
Further reduced testcase: target datalayout = "e-m:o-p:32:32-f64:32:64-f80:128-n8:16:32-S128" target triple = "i386-apple-macosx10.9.0" @a = common global i32 0, align 4 @b = common global i8 0, align 1 define i32* @main() { entry: %0 = load i32* @a, align 4 %tobool = icmp eq i32 %0, 0 br i1 %tobool, label %exit, label %block1 block1: br i1 icmp eq (i32* bitcast (i8* @b to i32*), i32* @a), label %exit, label %block2 block2: br label %exit exit: %storemerge = phi i32* [ null, %entry ],[ null, %block2 ], [ select (i1 icmp eq (i64 urem (i64 2, i64 zext (i1 icmp eq (i32* bitcast (i8* @b to i32*), i32* @a) to i64)), i64 0), i32* null, i32* @a), %block1 ] ret i32* %storemerge }
simplifycfg debug output shows that it's hoisting the urem op into the entry block which leads to the exception (div by 0): FOLDING BRs: entry: %0 = load i32* @a, align 4 %tobool = icmp eq i32 %0, 0 br i1 %tobool, label %exit, label %block1 AND: block1: ; preds = %entry br i1 icmp eq (i32* bitcast (i8* @b to i32*), i32* @a), label %exit, label %block2 define i32* @main() { entry: %0 = load i32* @a, align 4 %tobool = icmp eq i32 %0, 0 br i1 %tobool, label %exit, label %block1 block1: ; preds = %entry br i1 icmp eq (i32* bitcast (i8* @b to i32*), i32* @a), label %exit, label %block2 block2: ; preds = %block1 br label %exit exit: ; preds = %block2, %block1, %entry %storemerge = phi i32* [ null, %entry ], [ null, %block2 ], [ select (i1 icmp eq (i64 urem (i64 2, i64 zext (i1 icmp eq (i32* bitcast (i8* @b to i32*), i32* @a) to i64)), i64 0), i32* null, i32* @a), %block1 ] ret i32* %storemerge } INTO: entry: %0 = load i32* @a, align 4 %tobool = icmp eq i32 %0, 0 %brmerge = or i1 %tobool, icmp eq (i32* bitcast (i8* @b to i32*), i32* @a) %.mux = select i1 %tobool, i32* null, i32* select (i1 icmp eq (i64 urem (i64 2, i64 zext (i1 icmp eq (i32* bitcast (i8* @b to i32*), i32* @a) to i64)), i64 0), i32* null, i32* @a) br i1 %brmerge, label %exit, label %block2 define i32* @main() { entry: %0 = load i32* @a, align 4 %tobool = icmp eq i32 %0, 0 %brmerge = or i1 %tobool, icmp eq (i32* bitcast (i8* @b to i32*), i32* @a) %.mux = select i1 %tobool, i32* null, i32* select (i1 icmp eq (i64 urem (i64 2, i64 zext (i1 icmp eq (i32* bitcast (i8* @b to i32*), i32* @a) to i64)), i64 0), i32* null, i32* @a) br i1 %brmerge, label %exit, label %block2 block1: ; No predecessors! br i1 icmp eq (i32* bitcast (i8* @b to i32*), i32* @a), label %exit, label %block2 block2: ; preds = %entry, %block1 br label %exit exit: ; preds = %entry, %block2, %block1 %storemerge = phi i32* [ %.mux, %entry ], [ null, %block2 ], [ select (i1 icmp eq (i64 urem (i64 2, i64 zext (i1 icmp eq (i32* bitcast (i8* @b to i32*), i32* @a) to i64)), i64 0), i32* null, i32* @a), %block1 ] ret i32* %storemerge } Removing BB: block1: ; No predecessors! br i1 icmp eq (i32* bitcast (i8* @b to i32*), i32* @a), label %exit, label %block2 Looking to fold block2 into exit Can't fold, phi node storemerge in exit is conflicting with regard to common predecessor entry FOUND IF CONDITION! %brmerge = or i1 %tobool, icmp eq (i32* bitcast (i8* @b to i32*), i32* @a) T: entry F: block2 Removing BB: block2: ; No predecessors! br label %exit
SimplifyCondBranchToCondBranch() in SimplifyCFG.cpp does the following for the code sample in comment 6: 1. Sees this branch in 'entry': br i1 %tobool, label %exit, label %block1 2. Sees this branch in 'block1': br i1 icmp eq (i32* bitcast (i8* @b to i32*), i32* @a), label %exit, label %block2 3. Notices that the 'true' condition for both branches will go to 'exit'. 4. Combines the conditions with an 'or' (brmerge). Up to here we're fine, but the destination block has a phi, so we have to fix up the phi entries. In order to do that, we hoist the phi logic (the urem, etc) for 'block1' into 'entry' and create .mux which selects the correct value based on the existing 2 conditions. I think the solution is to make sure that the code in the phi entry can't trap; ie, this transform would be safe for most code, but not division and friends. SimplifyCFG uses Constant::canTrap() or SpeculativelyExecuteBB() for other cases, but it looks like that guard was missed on this particular transform.
(In reply to comment #8) ... > > I think the solution is to make sure that the code in the phi entry can't > trap; ie, this transform would be safe for most code, but not division and > friends. > > SimplifyCFG uses Constant::canTrap() or SpeculativelyExecuteBB() for other > cases, but it looks like that guard was missed on this particular transform. That sounds right. FWIW, this also reminds me of PR16729 (fixed in r197449) from the loop vectorizer.
Thanks, Hal. I'll send a patch for review to the commits list tomorrow.
Patch submitted via Phabricator: http://reviews.llvm.org/D4408
Patch checked in with: http://llvm.org/viewvc/llvm-project?view=revision&revision=212490