40657 – IR programs printing incorrect results after being compiled with -O0

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 40657 - IR programs printing incorrect results after being compiled with -O0

Summary: IR programs printing incorrect results after being compiled with -O0

Status:	RESOLVED FIXED

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Common Code Generator Code (show other bugs)
Version:	trunk
Hardware:	PC Linux

Importance:	P normal
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2019-02-07 23:53 PST by Juneyoung Lee
Modified:	2019-02-11 00:27 PST (History)
CC List:	7 users (show)

See Also:
Fixed By Commit(s):	r353615, r353639

Attachments
bug 1 (712 bytes, text/plain) 2019-02-07 23:53 PST, Juneyoung Lee	Details
buggy inputs (5.06 KB, application/x-zip-compressed) 2019-02-07 23:54 PST, Juneyoung Lee	Details
Show Obsolete (1) Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Juneyoung Lee 2019-02-07 23:53:00 PST

Created attachment 21449 [details]
bug 1

Consider this program:

```

define i32 @main() {
entry:
  %call = tail call i64 @f(i8 249, i8 101)
  %call1 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), i64 %call)
  ret i32 0
}

declare i32 @printf(i8* nocapture readonly, ...)

@.str = private unnamed_addr constant [4 x i8] c"%lu\00", align 1
define i64 @f(i8 %var2, i8 %var9) {
  %var6 = trunc i8 %var9 to i1
  %var7 = trunc i8 175 to i1
  %var3 = sub nsw i1 %var6, %var7
  %var4 = icmp eq i64 1114591064, 1114591064
  %var1 = udiv i1 %var3, %var4
  %var0 = trunc i8 %var2 to i1
  %res = sub nsw nuw i1 %var0, %var1
  %res.cast = zext i1 %res to i64
  ret i64 %res.cast
}
```

This program should print 1, but it prints 0 after being compiled with -O0.

$ clang --version
clang version 9.0.0 (https://github.com/llvm-mirror/clang.git 23cffd780f5647d02c6c12491f9b2c928757c798) (https://github.com/llvm-mirror/llvm.git 903a5691e0214c29a4da3796f9a4d598ad3d18f3)
Target: x86_64-unknown-linux-gnu
$ clang -O0 -o a.out 2332174.ll
$ ./a.out
0

I attach a few more programs which should print 1 but they print 0.

Comment 1 Juneyoung Lee 2019-02-07 23:54:29 PST

Created attachment 21450 [details]
buggy inputs

Comment 2 Sanjay Patel 2019-02-08 13:40:46 PST

Not sure if this is the whole problem, but TargetLowering::SimplifySetCC():
            // (Z-X) == X  --> Z == X<<1

This doesn't check to see if the operands are i1 (bool), so the shift produces poison/undef.

We don't have this problem in IR because instcombine knows that add/sub of i1 is really an 'xor':

  if (Ty->isIntOrIntVectorTy(1))
    return BinaryOperator::CreateXor(LHS, RHS);

Comment 3 Sanjay Patel 2019-02-08 15:25:51 PST

(In reply to Sanjay Patel from comment #2)
> Not sure if this is the whole problem, but TargetLowering::SimplifySetCC():
>             // (Z-X) == X  --> Z == X<<1
> 
> This doesn't check to see if the operands are i1 (bool), so the shift
> produces poison/undef.
> 
> We don't have this problem in IR because instcombine knows that add/sub of
> i1 is really an 'xor':
> 
>   if (Ty->isIntOrIntVectorTy(1))
>     return BinaryOperator::CreateXor(LHS, RHS);

That might've been better stated as:
We don't have this problem in IR because we don't do this questionable transform. 

Ie, why is a shift better than a subtract here? I took out the fold completely, and no regression tests fail, but I guess it's better to guard against the bug as a 1st step.

Note that just adding the transforms for 'add/sub i1 -> xor i1' is not enough to fix the bug because we can reach the setcc first. (But adding those transforms might be a good change independently.)

Comment 4 Simon Pilgrim 2019-02-09 03:59:49 PST

(In reply to Sanjay Patel from comment #3)
> 
> Ie, why is a shift better than a subtract here? I took out the fold
> completely, and no regression tests fail, but I guess it's better to guard
> against the bug as a 1st step.

TBH, if this causes no test regressions I'd vote for removing it entirely straight away.

Comment 5 Juneyoung Lee 2019-02-09 04:35:58 PST

(In reply to Sanjay Patel from comment #3)
> (In reply to Sanjay Patel from comment #2)
> > Not sure if this is the whole problem, but TargetLowering::SimplifySetCC():
> >             // (Z-X) == X  --> Z == X<<1
> > 
> > This doesn't check to see if the operands are i1 (bool), so the shift
> > produces poison/undef.
> > 
> > We don't have this problem in IR because instcombine knows that add/sub of
> > i1 is really an 'xor':
> > 
> >   if (Ty->isIntOrIntVectorTy(1))
> >     return BinaryOperator::CreateXor(LHS, RHS);
> 
> That might've been better stated as:
> We don't have this problem in IR because we don't do this questionable
> transform. 
> 
> Ie, why is a shift better than a subtract here? I took out the fold
> completely, and no regression tests fail, but I guess it's better to guard
> against the bug as a 1st step.
> 
> Note that just adding the transforms for 'add/sub i1 -> xor i1' is not
> enough to fix the bug because we can reach the setcc first. (But adding
> those transforms might be a good change independently.)

Hello Sanjay,

I'm not sure whether X<<1 returns undef/poison in SelDAG (It is poison in IR, but SelDag is a lower level language), but I agree that the transformation can be the source of the bug, as SelDag and IR seems to share the semantics of arithmetic operations.
Folding '(Z-X) == X' to 'Z == X<<1' seems problematic when X is 1-bit size.

I found that disabling the folding fixes 8 buggy cases including the one in the text but 3 cases still remain:

1673800.ll: prints 0                
8566692.ll: prints 0                
9510019.ll: prints 0                

All of them should print 1.
I tried to track which transformation was responsible for this output, but I couldn't. Do you have any idea?

p.s: 66787.ll should print 15, and disabling '(Z-X) == X' to 'Z == X<<1' correctly fixes it. I said that all programs should run 1 at comment #1, but 66787.ll was an exception, sorry.

Comment 6 Sanjay Patel 2019-02-09 08:03:08 PST

(In reply to Juneyoung Lee from comment #5)
> I'm not sure whether X<<1 returns undef/poison in SelDAG (It is poison in
> IR, but SelDag is a lower level language), 

SDAG has poison because it propagates the wrapping/exact flags from IR. As a practical matter, it is harder to see poison/undef problems in this layer because:
1. Most of them would be exposed in the IR optimizer first.
2. SDAG does not have as many of the poison/undef-based transforms that IR performs.
3. The bug becomes invisible in the translation to the target's assembly. 


> I found that disabling the folding fixes 8 buggy cases including the one in
> the text but 3 cases still remain:
> 
> 1673800.ll: prints 0                
> 8566692.ll: prints 0                
> 9510019.ll: prints 0                
> 
> All of them should print 1.
> I tried to track which transformation was responsible for this output, but I
> couldn't. Do you have any idea?

Not yet - but let's take this 1 step at a time. I think there's no question about the 1st fix, so I have committed that here:
https://reviews.llvm.org/rL353615

Comment 7 Sanjay Patel 2019-02-09 09:08:28 PST

(In reply to Simon Pilgrim from comment #4)
> (In reply to Sanjay Patel from comment #3)
> > 
> > Ie, why is a shift better than a subtract here? I took out the fold
> > completely, and no regression tests fail, but I guess it's better to guard
> > against the bug as a 1st step.
> 
> TBH, if this causes no test regressions I'd vote for removing it entirely
> straight away.

I added some tests to see if there was anything else doing the same thing. There isn't, and there is a minor potential benefit to the shift (which gets translated to add somewhere in X86ISelDAGToDAG.cpp), so I think it's ok to keep.

Comment 8 Sanjay Patel 2019-02-09 09:14:33 PST

(In reply to Sanjay Patel from comment #7)
> I added some tests to see if there was anything else doing the same thing.
> There isn't, and there is a minor potential benefit to the shift (which gets
> translated to add somewhere in X86ISelDAGToDAG.cpp), so I think it's ok to
> keep.

That was here:
https://reviews.llvm.org/rL353619

Comment 9 Sanjay Patel 2019-02-09 09:31:33 PST

(In reply to Sanjay Patel from comment #6) 
> > 1673800.ll: prints 0                

This is the same problem from what I see. The code that was just fixed is duplicated later for the commuted case:
          // X == (Z-X)  --> X<<1 == Z

This whole "function" is a giant mess, so I'll see if I can reduce the code duplication.

Comment 10 Sanjay Patel 2019-02-10 07:03:46 PST

All attached examples should be fixed after:
https://reviews.llvm.org/rL353639

Feel free to reopen if that's not correct.

Side note about fuzzing for LLVM bugs:
If you're looking for backend bugs, it might be worth trying something like this:

$ clang -O2 test.c -S -emit-llvm -Xclang -disable-llvm-optzns -o unoptimized_ir.ll
$ llc -O2 unoptimized_ir.ll -o unoptimized_ir_optimized_asm.s
$ clang unoptimized_ir_optimized_asm.s -o maybe_buggy_executable
$ clang test.c -o reference_executable
$ { compare output of } reference_executable maybe_buggy_executable

A lot of people get that 1st step wrong: you want clang to create an IR file that allows optimization by the backend, but skip intermediate optimization. That doesn't happen if you use "-O0 -emit-llvm" with clang.

Comment 11 Juneyoung Lee 2019-02-11 00:27:47 PST

Thank you for the advice! :)