Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong code (SIGFPE) at -O3 on x86_64-linux-gnu (in 32-bit mode) (-simplifycfg) #17447

Closed
zhendongsu opened this issue Sep 3, 2013 · 12 comments
Closed
Labels
bugzilla Issues migrated from bugzilla

Comments

@zhendongsu
Copy link

Bugzilla Link 17073
Resolution FIXED
Resolved on Jul 07, 2014 16:21
Version trunk
OS All
CC @hfinkel,@isanbard,@rotateright

Extended Description

The following code is miscompiled by the current clang trunk, clang 3.2, and clang 3.3 on x86_64-linux-gnu at -O3 in 32-bit mode, resulting in a SIGFPE.

$ clang-trunk -v
clang version 3.4 (trunk 189735)
$ clang-trunk -m32 -O2 small.c
$ a.out
$ clang-trunk -m32 -O3 small.c
$ a.out
Floating point exception (core dumped)
$ clang-3.3 -m32 -O3 small.c
$ a.out
Floating point exception (core dumped)
$ clang-3.2 -m32 -O3 small.c
$ a.out
Floating point exception (core dumped)
$ clang-trunk -m64 -O3 small.c
$ a.out
$


int a, b, d, e, *f = &b, g, i;
char c;

int
foo (int p1, long long p2)
{
return p2 == 0 ? 0 : p1 % p2;
}

void
bar (int p)
{
lbl:
if (*f)
g = 0;
int **l = &f;
*l = 0;
for (; i < 1; i++)
{
if (i)
goto lbl;
if (foo (2, d || p))
{
int **m = &f;
*m = &e;
}
}
}

int
main ()
{
bar (&c == (void *)&a);
return 0;
}

@isanbard
Copy link
Contributor

isanbard commented Jan 5, 2014

The SIGFPE is correct. See the comments:

int a, b, d, e, *f = &b, g, i;
char c;

int main() {
/* '&c == (void*)&a' is false */
bar (&c == (void *)&a);
return 0;
}

void bar(int p) {
/* 'p' is 0 */
lbl:
if (*f)
g = 0;
int **l = &f;
l = 0;
for (; i < 1; i++)
{
if (i)
/
i == 1 /
goto lbl;
/
'd == 0' and 'p == 0' (see above)
* Therefore, this call is 'foo(2, 0)'
*/
if (foo (2, d || p))
{
int **m = &f;
m = &e;
}
/
i == 1 */
}
}

int foo(int p1, long long p2) {
/* 'p1 == 2' and 'p2' == 0

  • This implies 'p1 % p2' is UB.
    */
    return p2 == 0 ? 0 : p1 % p2;
    }

So, it looks like it's valid for it to have a SIGFPE. The other optimization levels don't convert to the same code (of course). It results in either the '%' never being called or possibly never emitted.

@zhendongsu
Copy link
Author

int foo(int p1, long long p2) {
/* 'p1 == 2' and 'p2' == 0

  • This implies 'p1 % p2' is UB.
    */
    return p2 == 0 ? 0 : p1 % p2;
    }

Bill, we'll need to be careful here. Since p2 == 0 holds, p1 % p2 isn't executed at all. This is exactly how we guard for division by zeros.

The code is valid and doesn't have any UBs.

@isanbard
Copy link
Contributor

isanbard commented Jan 6, 2014

Bill, we'll need to be careful here. Since p2 == 0 holds, p1 % p2 isn't
executed at all. This is exactly how we guard for division by zeros.

The code is valid and doesn't have any UBs.

You're right. My internal C messed up on the precedence of the operators. Sorry about that.

@rotateright
Copy link
Contributor

$ cat 17073_reduced.ll
; ModuleID = 'tmp3.ll'
target datalayout = "e-m:o-p:32:32-f64:32:64-f80:128-n8:16:32-S128"
target triple = "i386-apple-macosx10.9.0"

@​b = common global i32 0, align 4
@​f = global i32* @​b, align 4
@​g = common global i32 0, align 4
@​i = common global i32 0, align 4
@​d = common global i32 0, align 4
@​e = common global i32 0, align 4
@​c = common global i8 0, align 1
@​a = common global i32 0, align 4

define i32 @​main() {
entry:
%f.promoted.i = load i32** @​f, align 4, !tbaa !​0
%0 = load i32* %f.promoted.i, align 4, !tbaa !​4
%tobool.i = icmp eq i32 %0, 0
br i1 %tobool.i, label %if.end.i, label %if.then.i

if.then.i: ; preds = %entry
store i32 0, i32* @​g, align 4, !tbaa !​4
br label %if.end.i

if.end.i: ; preds = %if.then.i, %entry
%pr.i = load i32* @​i, align 4, !tbaa !​4
%cmp3.i = icmp slt i32 %pr.i, 1
br i1 %cmp3.i, label %for.body1, label %bar.exit

for.body1: ; preds = %if.end.i
%1 = load i32* @​d, align 4, !tbaa !​4
%tobool4 = icmp eq i32 %1, 0
br i1 %tobool4, label %for.body6, label %for.body3

for.body3: ; preds = %for.body1
br i1 icmp eq (i32* bitcast (i8* @​c to i32*), i32* @​a), label %if.end8, label %if.end9

if.end8: ; preds = %for.body3
store i32 1, i32* @​i, align 4, !tbaa !​4
br label %bar.exit

if.end9: ; preds = %for.body3
store i32 1, i32* @​i, align 4, !tbaa !​4
br label %bar.exit

for.body6: ; preds = %for.body1
store i32 1, i32* @​i, align 4, !tbaa !​4
br label %bar.exit

bar.exit: ; preds = %for.body6, %if.end9, %if.end8, %if.end.i
%storemerge33.i = phi i32* [ null, %if.end.i ], [ null, %for.body6 ], [ null, %if.end9 ], [ select (i1 icmp eq (i64 urem (i64 2, i64 zext (i1 icmp eq (i32* bitcast (i8* @​c to i32*), i32* @​a) to i64)), i64 0), i32* null, i32* @​e), %if.end8 ]
store i32* %storemerge33.i, i32** @​f, align 4, !tbaa !​0
ret i32 0
}

!​0 = metadata !{metadata !​1, metadata !​1, i64 0}
!​1 = metadata !{metadata !"int", metadata !​2, i64 0}
!​2 = metadata !{metadata !"omnipotent char", metadata !​3, i64 0}
!​3 = metadata !{metadata !"Simple C/C++ TBAA"}
!​4 = metadata !{metadata !​5, metadata !​5, i64 0}
!​5 = metadata !{metadata !"any pointer", metadata !​2, i64 0}

@rotateright
Copy link
Contributor

The testcase in comment 4 fails in -simplifycfg:

$ ./opt 17073_reduced.ll | ./llc | ./clang -m32 -x assembler - -o a.out -Wl,-no_pie; ./a.out

$ ./opt -simplifycfg 17073_reduced.ll | ./llc | ./clang -m32 -x assembler - -o a.out -Wl,-no_pie; ./a.out
Floating point exception: 8

@rotateright
Copy link
Contributor

Further reduced testcase:

target datalayout = "e-m:o-p:32:32-f64:32:64-f80:128-n8:16:32-S128"
target triple = "i386-apple-macosx10.9.0"

@​a = common global i32 0, align 4
@​b = common global i8 0, align 1

define i32* @​main() {
entry:
%0 = load i32* @​a, align 4
%tobool = icmp eq i32 %0, 0
br i1 %tobool, label %exit, label %block1

block1:
br i1 icmp eq (i32* bitcast (i8* @​b to i32*), i32* @​a), label %exit, label %block2

block2:
br label %exit

exit:
%storemerge = phi i32* [ null, %entry ],[ null, %block2 ], [ select (i1 icmp eq (i64 urem (i64 2, i64 zext (i1 icmp eq (i32* bitcast (i8* @​b to i32*), i32* @​a) to i64)), i64 0), i32* null, i32* @​a), %block1 ]
ret i32* %storemerge
}

@rotateright
Copy link
Contributor

simplifycfg debug output shows that it's hoisting the urem op into the entry block which leads to the exception (div by 0):

FOLDING BRs:
entry:
%0 = load i32* @​a, align 4
%tobool = icmp eq i32 %0, 0
br i1 %tobool, label %exit, label %block1
AND:
block1: ; preds = %entry
br i1 icmp eq (i32* bitcast (i8* @​b to i32*), i32* @​a), label %exit, label %block2

define i32* @​main() {
entry:
%0 = load i32* @​a, align 4
%tobool = icmp eq i32 %0, 0
br i1 %tobool, label %exit, label %block1

block1: ; preds = %entry
br i1 icmp eq (i32* bitcast (i8* @​b to i32*), i32* @​a), label %exit, label %block2

block2: ; preds = %block1
br label %exit

exit: ; preds = %block2, %block1, %entry
%storemerge = phi i32* [ null, %entry ], [ null, %block2 ], [ select (i1 icmp eq (i64 urem (i64 2, i64 zext (i1 icmp eq (i32* bitcast (i8* @​b to i32*), i32* @​a) to i64)), i64 0), i32* null, i32* @​a), %block1 ]
ret i32* %storemerge
}
INTO:
entry:
%0 = load i32* @​a, align 4
%tobool = icmp eq i32 %0, 0
%brmerge = or i1 %tobool, icmp eq (i32* bitcast (i8* @​b to i32*), i32* @​a)
%.mux = select i1 %tobool, i32* null, i32* select (i1 icmp eq (i64 urem (i64 2, i64 zext (i1 icmp eq (i32* bitcast (i8* @​b to i32*), i32* @​a) to i64)), i64 0), i32* null, i32* @​a)
br i1 %brmerge, label %exit, label %block2

define i32* @​main() {
entry:
%0 = load i32* @​a, align 4
%tobool = icmp eq i32 %0, 0
%brmerge = or i1 %tobool, icmp eq (i32* bitcast (i8* @​b to i32*), i32* @​a)
%.mux = select i1 %tobool, i32* null, i32* select (i1 icmp eq (i64 urem (i64 2, i64 zext (i1 icmp eq (i32* bitcast (i8* @​b to i32*), i32* @​a) to i64)), i64 0), i32* null, i32* @​a)
br i1 %brmerge, label %exit, label %block2

block1: ; No predecessors!
br i1 icmp eq (i32* bitcast (i8* @​b to i32*), i32* @​a), label %exit, label %block2

block2: ; preds = %entry, %block1
br label %exit

exit: ; preds = %entry, %block2, %block1
%storemerge = phi i32* [ %.mux, %entry ], [ null, %block2 ], [ select (i1 icmp eq (i64 urem (i64 2, i64 zext (i1 icmp eq (i32* bitcast (i8* @​b to i32*), i32* @​a) to i64)), i64 0), i32* null, i32* @​a), %block1 ]
ret i32* %storemerge
}
Removing BB:

block1: ; No predecessors!
br i1 icmp eq (i32* bitcast (i8* @​b to i32*), i32* @​a), label %exit, label %block2
Looking to fold block2 into exit
Can't fold, phi node storemerge in exit is conflicting with regard to common predecessor entry
FOUND IF CONDITION! %brmerge = or i1 %tobool, icmp eq (i32* bitcast (i8* @​b to i32*), i32* @​a) T: entry F: block2
Removing BB:

block2: ; No predecessors!
br label %exit

@rotateright
Copy link
Contributor

SimplifyCondBranchToCondBranch() in SimplifyCFG.cpp does the following for the code sample in comment 6:

  1. Sees this branch in 'entry': br i1 %tobool, label %exit, label %block1

  2. Sees this branch in 'block1': br i1 icmp eq (i32* bitcast (i8* @​b to i32*), i32* @​a), label %exit, label %block2

  3. Notices that the 'true' condition for both branches will go to 'exit'.

  4. Combines the conditions with an 'or' (brmerge).

Up to here we're fine, but the destination block has a phi, so we have to fix up the phi entries. In order to do that, we hoist the phi logic (the urem, etc) for 'block1' into 'entry' and create .mux which selects the correct value based on the existing 2 conditions.

I think the solution is to make sure that the code in the phi entry can't trap; ie, this transform would be safe for most code, but not division and friends.

SimplifyCFG uses Constant::canTrap() or SpeculativelyExecuteBB() for other cases, but it looks like that guard was missed on this particular transform.

@hfinkel
Copy link
Collaborator

hfinkel commented Jul 7, 2014

...

I think the solution is to make sure that the code in the phi entry can't
trap; ie, this transform would be safe for most code, but not division and
friends.

SimplifyCFG uses Constant::canTrap() or SpeculativelyExecuteBB() for other
cases, but it looks like that guard was missed on this particular transform.

That sounds right.

FWIW, this also reminds me of #17103 (fixed in r197449) from the loop vectorizer.

@rotateright
Copy link
Contributor

Thanks, Hal. I'll send a patch for review to the commits list tomorrow.

@rotateright
Copy link
Contributor

Patch submitted via Phabricator:
http://reviews.llvm.org/D4408

@rotateright
Copy link
Contributor

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 9, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

4 participants