17073 – wrong code (SIGFPE) at -O3 on x86_64-linux-gnu (in 32-bit mode) (-simplifycfg)

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 17073 - wrong code (SIGFPE) at -O3 on x86_64-linux-gnu (in 32-bit mode) (-simplifycfg)

Summary: wrong code (SIGFPE) at -O3 on x86_64-linux-gnu (in 32-bit mode) (-simplifycfg)

Status:	RESOLVED FIXED

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Transformation Utilities (show other bugs)
Version:	trunk
Hardware:	PC All

Importance:	P normal
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2013-09-03 01:10 PDT by Zhendong Su
Modified:	2014-07-07 16:21 PDT (History)
CC List:	5 users (show)

See Also:	16729
Fixed By Commit(s):

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Zhendong Su 2013-09-03 01:10:57 PDT

The following code is miscompiled by the current clang trunk, clang 3.2, and clang 3.3 on x86_64-linux-gnu at -O3 in 32-bit mode, resulting in a SIGFPE. 

$ clang-trunk -v
clang version 3.4 (trunk 189735)
$ clang-trunk -m32 -O2 small.c
$ a.out
$ clang-trunk -m32 -O3 small.c
$ a.out
Floating point exception (core dumped)
$ clang-3.3 -m32 -O3 small.c
$ a.out
Floating point exception (core dumped)
$ clang-3.2 -m32 -O3 small.c
$ a.out
Floating point exception (core dumped)
$ clang-trunk -m64 -O3 small.c
$ a.out
$ 


------------------------------------

int a, b, d, e, *f = &b, g, i;
char c;

int 
foo (int p1, long long p2)
{
  return p2 == 0 ? 0 : p1 % p2;
}

void
bar (int p)
{
 lbl:
  if (*f)
    g = 0;
  int **l = &f;
  *l = 0;
  for (; i < 1; i++)
    {
      if (i)
	goto lbl;
      if (foo (2, d || p))
	{
	  int **m = &f;
	  *m = &e;
	}
    }
}

int
main ()
{
  bar (&c == (void *)&a);
  return 0;
}

Comment 1 Bill Wendling 2014-01-04 21:24:04 PST

The SIGFPE is correct. See the comments:

int a, b, d, e, *f = &b, g, i;
char c;

int main() {
  /* '&c == (void*)&a' is false */
  bar (&c == (void *)&a);
  return 0;
}

void bar(int p) {
 /* 'p' is 0 */
 lbl:
  if (*f)
    g = 0;
  int **l = &f;
  *l = 0;
  for (; i < 1; i++)
    {
      if (i)
        /* i == 1 */
	goto lbl;
      /* 'd == 0' and 'p == 0' (see above)
       * Therefore, this call is 'foo(2, 0)'
       */
      if (foo (2, d || p))
	{
	  int **m = &f;
	  *m = &e;
	}
      /* i == 1 */
    }
}

int foo(int p1, long long p2) {
  /* 'p1 == 2' and 'p2' == 0
   * This implies 'p1 % p2' is UB.
   */
  return p2 == 0 ? 0 : p1 % p2;
}

So, it looks like it's valid for it to have a SIGFPE. The other optimization levels don't convert to the same code (of course). It results in either the '%' never being called or possibly never emitted.

Comment 2 Zhendong Su 2014-01-04 22:01:09 PST

> int foo(int p1, long long p2) {
>   /* 'p1 == 2' and 'p2' == 0
>    * This implies 'p1 % p2' is UB.
>    */
>   return p2 == 0 ? 0 : p1 % p2;
> }

Bill, we'll need to be careful here. Since p2 == 0 holds, p1 % p2 isn't executed at all.  This is exactly how we guard for division by zeros. 

The code is valid and doesn't have any UBs.

Comment 3 Bill Wendling 2014-01-05 18:14:49 PST

(In reply to comment #2)
> Bill, we'll need to be careful here. Since p2 == 0 holds, p1 % p2 isn't
> executed at all.  This is exactly how we guard for division by zeros. 
> 
> The code is valid and doesn't have any UBs.

You're right. My internal C messed up on the precedence of the operators. Sorry about that.

Comment 4 Sanjay Patel 2014-07-04 15:16:15 PDT

$ cat 17073_reduced.ll 
; ModuleID = 'tmp3.ll'
target datalayout = "e-m:o-p:32:32-f64:32:64-f80:128-n8:16:32-S128"
target triple = "i386-apple-macosx10.9.0"

@b = common global i32 0, align 4
@f = global i32* @b, align 4
@g = common global i32 0, align 4
@i = common global i32 0, align 4
@d = common global i32 0, align 4
@e = common global i32 0, align 4
@c = common global i8 0, align 1
@a = common global i32 0, align 4

define i32 @main() {
entry:
  %f.promoted.i = load i32** @f, align 4, !tbaa !0
  %0 = load i32* %f.promoted.i, align 4, !tbaa !4
  %tobool.i = icmp eq i32 %0, 0
  br i1 %tobool.i, label %if.end.i, label %if.then.i

if.then.i:                                        ; preds = %entry
  store i32 0, i32* @g, align 4, !tbaa !4
  br label %if.end.i

if.end.i:                                         ; preds = %if.then.i, %entry
  %pr.i = load i32* @i, align 4, !tbaa !4
  %cmp3.i = icmp slt i32 %pr.i, 1
  br i1 %cmp3.i, label %for.body1, label %bar.exit

for.body1:                                        ; preds = %if.end.i
  %1 = load i32* @d, align 4, !tbaa !4
  %tobool4 = icmp eq i32 %1, 0
  br i1 %tobool4, label %for.body6, label %for.body3

for.body3:                                        ; preds = %for.body1
  br i1 icmp eq (i32* bitcast (i8* @c to i32*), i32* @a), label %if.end8, label %if.end9

if.end8:                                          ; preds = %for.body3
  store i32 1, i32* @i, align 4, !tbaa !4
  br label %bar.exit

if.end9:                                          ; preds = %for.body3
  store i32 1, i32* @i, align 4, !tbaa !4
  br label %bar.exit

for.body6:                                        ; preds = %for.body1
  store i32 1, i32* @i, align 4, !tbaa !4
  br label %bar.exit

bar.exit:                                         ; preds = %for.body6, %if.end9, %if.end8, %if.end.i
  %storemerge33.i = phi i32* [ null, %if.end.i ], [ null, %for.body6 ], [ null, %if.end9 ], [ select (i1 icmp eq (i64 urem (i64 2, i64 zext (i1 icmp eq (i32* bitcast (i8* @c to i32*), i32* @a) to i64)), i64 0), i32* null, i32* @e), %if.end8 ]
  store i32* %storemerge33.i, i32** @f, align 4, !tbaa !0
  ret i32 0
}

!0 = metadata !{metadata !1, metadata !1, i64 0}
!1 = metadata !{metadata !"int", metadata !2, i64 0}
!2 = metadata !{metadata !"omnipotent char", metadata !3, i64 0}
!3 = metadata !{metadata !"Simple C/C++ TBAA"}
!4 = metadata !{metadata !5, metadata !5, i64 0}
!5 = metadata !{metadata !"any pointer", metadata !2, i64 0}

Comment 5 Sanjay Patel 2014-07-04 15:17:28 PDT

The testcase in comment 4 fails in -simplifycfg:

$ ./opt 17073_reduced.ll | ./llc | ./clang -m32 -x assembler - -o a.out -Wl,-no_pie; ./a.out 

$ ./opt -simplifycfg 17073_reduced.ll | ./llc | ./clang -m32 -x assembler - -o a.out -Wl,-no_pie; ./a.out 
Floating point exception: 8

Comment 6 Sanjay Patel 2014-07-05 16:02:35 PDT

Further reduced testcase:

target datalayout = "e-m:o-p:32:32-f64:32:64-f80:128-n8:16:32-S128"
target triple = "i386-apple-macosx10.9.0"

@a = common global i32 0, align 4
@b = common global i8 0, align 1

define i32* @main() {
entry:
  %0 = load i32* @a, align 4
  %tobool = icmp eq i32 %0, 0
  br i1 %tobool, label %exit, label %block1

block1:
  br i1 icmp eq (i32* bitcast (i8* @b to i32*), i32* @a), label %exit, label %block2

block2:
  br label %exit

exit:
  %storemerge = phi i32* [ null, %entry ],[ null, %block2 ], [ select (i1 icmp eq (i64 urem (i64 2, i64 zext (i1 icmp eq (i32* bitcast (i8* @b to i32*), i32* @a) to i64)), i64 0), i32* null, i32* @a), %block1 ]
  ret i32* %storemerge
}

Comment 7 Sanjay Patel 2014-07-05 16:04:35 PDT

simplifycfg debug output shows that it's hoisting the urem op into the entry block which leads to the exception (div by 0):

FOLDING BRs:
entry:
  %0 = load i32* @a, align 4
  %tobool = icmp eq i32 %0, 0
  br i1 %tobool, label %exit, label %block1
AND: 
block1:                                           ; preds = %entry
  br i1 icmp eq (i32* bitcast (i8* @b to i32*), i32* @a), label %exit, label %block2

define i32* @main() {
entry:
  %0 = load i32* @a, align 4
  %tobool = icmp eq i32 %0, 0
  br i1 %tobool, label %exit, label %block1

block1:                                           ; preds = %entry
  br i1 icmp eq (i32* bitcast (i8* @b to i32*), i32* @a), label %exit, label %block2

block2:                                           ; preds = %block1
  br label %exit

exit:                                             ; preds = %block2, %block1, %entry
  %storemerge = phi i32* [ null, %entry ], [ null, %block2 ], [ select (i1 icmp eq (i64 urem (i64 2, i64 zext (i1 icmp eq (i32* bitcast (i8* @b to i32*), i32* @a) to i64)), i64 0), i32* null, i32* @a), %block1 ]
  ret i32* %storemerge
}
INTO: 
entry:
  %0 = load i32* @a, align 4
  %tobool = icmp eq i32 %0, 0
  %brmerge = or i1 %tobool, icmp eq (i32* bitcast (i8* @b to i32*), i32* @a)
  %.mux = select i1 %tobool, i32* null, i32* select (i1 icmp eq (i64 urem (i64 2, i64 zext (i1 icmp eq (i32* bitcast (i8* @b to i32*), i32* @a) to i64)), i64 0), i32* null, i32* @a)
  br i1 %brmerge, label %exit, label %block2

define i32* @main() {
entry:
  %0 = load i32* @a, align 4
  %tobool = icmp eq i32 %0, 0
  %brmerge = or i1 %tobool, icmp eq (i32* bitcast (i8* @b to i32*), i32* @a)
  %.mux = select i1 %tobool, i32* null, i32* select (i1 icmp eq (i64 urem (i64 2, i64 zext (i1 icmp eq (i32* bitcast (i8* @b to i32*), i32* @a) to i64)), i64 0), i32* null, i32* @a)
  br i1 %brmerge, label %exit, label %block2

block1:                                           ; No predecessors!
  br i1 icmp eq (i32* bitcast (i8* @b to i32*), i32* @a), label %exit, label %block2

block2:                                           ; preds = %entry, %block1
  br label %exit

exit:                                             ; preds = %entry, %block2, %block1
  %storemerge = phi i32* [ %.mux, %entry ], [ null, %block2 ], [ select (i1 icmp eq (i64 urem (i64 2, i64 zext (i1 icmp eq (i32* bitcast (i8* @b to i32*), i32* @a) to i64)), i64 0), i32* null, i32* @a), %block1 ]
  ret i32* %storemerge
}
Removing BB: 

block1:                                           ; No predecessors!
  br i1 icmp eq (i32* bitcast (i8* @b to i32*), i32* @a), label %exit, label %block2
Looking to fold block2 into exit
Can't fold, phi node storemerge in exit is conflicting with regard to common predecessor entry
FOUND IF CONDITION!    %brmerge = or i1 %tobool, icmp eq (i32* bitcast (i8* @b to i32*), i32* @a)  T: entry  F: block2
Removing BB: 

block2:                                           ; No predecessors!
  br label %exit

Comment 8 Sanjay Patel 2014-07-06 18:01:09 PDT

SimplifyCondBranchToCondBranch() in SimplifyCFG.cpp does the following for the code sample in comment 6:

1. Sees this branch in 'entry':  br i1 %tobool, label %exit, label %block1

2. Sees this branch in 'block1':   br i1 icmp eq (i32* bitcast (i8* @b to i32*), i32* @a), label %exit, label %block2

3. Notices that the 'true' condition for both branches will go to 'exit'.

4. Combines the conditions with an 'or' (brmerge).

Up to here we're fine, but the destination block has a phi, so we have to fix up the phi entries. In order to do that, we hoist the phi logic (the urem, etc) for 'block1' into 'entry' and create .mux which selects the correct value based on the existing 2 conditions.

I think the solution is to make sure that the code in the phi entry can't trap; ie, this transform would be safe for most code, but not division and friends.

SimplifyCFG uses Constant::canTrap() or SpeculativelyExecuteBB() for other cases, but it looks like that guard was missed on this particular transform.

Comment 9 Hal Finkel 2014-07-06 20:16:56 PDT

(In reply to comment #8)
...
> 
> I think the solution is to make sure that the code in the phi entry can't
> trap; ie, this transform would be safe for most code, but not division and
> friends.
> 
> SimplifyCFG uses Constant::canTrap() or SpeculativelyExecuteBB() for other
> cases, but it looks like that guard was missed on this particular transform.

That sounds right.

FWIW, this also reminds me of PR16729 (fixed in r197449) from the loop vectorizer.

Comment 10 Sanjay Patel 2014-07-06 20:30:57 PDT

Thanks, Hal. I'll send a patch for review to the commits list tomorrow.

Comment 11 Sanjay Patel 2014-07-07 11:48:47 PDT

Patch submitted via Phabricator:
http://reviews.llvm.org/D4408

Comment 12 Sanjay Patel 2014-07-07 16:21:32 PDT

Patch checked in with:
http://llvm.org/viewvc/llvm-project?view=revision&revision=212490