Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IR programs printing incorrect results after being compiled with -O0 #40003

Closed
aqjune opened this issue Feb 8, 2019 · 11 comments
Closed

IR programs printing incorrect results after being compiled with -O0 #40003

aqjune opened this issue Feb 8, 2019 · 11 comments
Labels
bugzilla Issues migrated from bugzilla llvm:codegen

Comments

@aqjune
Copy link
Contributor

aqjune commented Feb 8, 2019

Bugzilla Link 40657
Resolution FIXED
Resolved on Feb 11, 2019 00:27
Version trunk
OS Linux
Attachments bug 1, buggy inputs
CC @topperc,@aqjune,@RKSimon,@nunoplopes,@rotateright
Fixed by commit(s) r353615, r353639

Extended Description

Consider this program:


define i32 @​main() {
entry:
  %call = tail call i64 @​f(i8 249, i8 101)
  %call1 = tail call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), i64 %call)
  ret i32 0
}

declare i32 @​printf(i8* nocapture readonly, ...)

@.str = private unnamed_addr constant [4 x i8] c"%lu\00", align 1
define i64 @​f(i8 %var2, i8 %var9) {
  %var6 = trunc i8 %var9 to i1
  %var7 = trunc i8 175 to i1
  %var3 = sub nsw i1 %var6, %var7
  %var4 = icmp eq i64 1114591064, 1114591064
  %var1 = udiv i1 %var3, %var4
  %var0 = trunc i8 %var2 to i1
  %res = sub nsw nuw i1 %var0, %var1
  %res.cast = zext i1 %res to i64
  ret i64 %res.cast
}

This program should print 1, but it prints 0 after being compiled with -O0.

$ clang --version
clang version 9.0.0 (https://github.com/llvm-mirror/clang.git 23cffd780f5647d02c6c12491f9b2c928757c798) (https://github.com/llvm-mirror/llvm.git 903a569)
Target: x86_64-unknown-linux-gnu
$ clang -O0 -o a.out 2332174.ll
$ ./a.out
0

I attach a few more programs which should print 1 but they print 0.

@rotateright
Copy link
Contributor

Not sure if this is the whole problem, but TargetLowering::SimplifySetCC():
// (Z-X) == X --> Z == X<<1

This doesn't check to see if the operands are i1 (bool), so the shift produces poison/undef.

We don't have this problem in IR because instcombine knows that add/sub of i1 is really an 'xor':

if (Ty->isIntOrIntVectorTy(1))
return BinaryOperator::CreateXor(LHS, RHS);

@rotateright
Copy link
Contributor

Not sure if this is the whole problem, but TargetLowering::SimplifySetCC():
// (Z-X) == X --> Z == X<<1

This doesn't check to see if the operands are i1 (bool), so the shift
produces poison/undef.

We don't have this problem in IR because instcombine knows that add/sub of
i1 is really an 'xor':

if (Ty->isIntOrIntVectorTy(1))
return BinaryOperator::CreateXor(LHS, RHS);

That might've been better stated as:
We don't have this problem in IR because we don't do this questionable transform.

Ie, why is a shift better than a subtract here? I took out the fold completely, and no regression tests fail, but I guess it's better to guard against the bug as a 1st step.

Note that just adding the transforms for 'add/sub i1 -> xor i1' is not enough to fix the bug because we can reach the setcc first. (But adding those transforms might be a good change independently.)

@RKSimon
Copy link
Collaborator

RKSimon commented Feb 9, 2019

Ie, why is a shift better than a subtract here? I took out the fold
completely, and no regression tests fail, but I guess it's better to guard
against the bug as a 1st step.

TBH, if this causes no test regressions I'd vote for removing it entirely straight away.

@aqjune
Copy link
Contributor Author

aqjune commented Feb 9, 2019

Not sure if this is the whole problem, but TargetLowering::SimplifySetCC():
// (Z-X) == X --> Z == X<<1

This doesn't check to see if the operands are i1 (bool), so the shift
produces poison/undef.

We don't have this problem in IR because instcombine knows that add/sub of
i1 is really an 'xor':

if (Ty->isIntOrIntVectorTy(1))
return BinaryOperator::CreateXor(LHS, RHS);

That might've been better stated as:
We don't have this problem in IR because we don't do this questionable
transform.

Ie, why is a shift better than a subtract here? I took out the fold
completely, and no regression tests fail, but I guess it's better to guard
against the bug as a 1st step.

Note that just adding the transforms for 'add/sub i1 -> xor i1' is not
enough to fix the bug because we can reach the setcc first. (But adding
those transforms might be a good change independently.)

Hello Sanjay,

I'm not sure whether X<<1 returns undef/poison in SelDAG (It is poison in IR, but SelDag is a lower level language), but I agree that the transformation can be the source of the bug, as SelDag and IR seems to share the semantics of arithmetic operations.
Folding '(Z-X) == X' to 'Z == X<<1' seems problematic when X is 1-bit size.

I found that disabling the folding fixes 8 buggy cases including the one in the text but 3 cases still remain:

1673800.ll: prints 0
8566692.ll: prints 0
9510019.ll: prints 0

All of them should print 1.
I tried to track which transformation was responsible for this output, but I couldn't. Do you have any idea?

p.s: 66787.ll should print 15, and disabling '(Z-X) == X' to 'Z == X<<1' correctly fixes it. I said that all programs should run 1 at comment #​1, but 66787.ll was an exception, sorry.

@rotateright
Copy link
Contributor

I'm not sure whether X<<1 returns undef/poison in SelDAG (It is poison in
IR, but SelDag is a lower level language),

SDAG has poison because it propagates the wrapping/exact flags from IR. As a practical matter, it is harder to see poison/undef problems in this layer because:

  1. Most of them would be exposed in the IR optimizer first.
  2. SDAG does not have as many of the poison/undef-based transforms that IR performs.
  3. The bug becomes invisible in the translation to the target's assembly.

I found that disabling the folding fixes 8 buggy cases including the one in
the text but 3 cases still remain:

1673800.ll: prints 0
8566692.ll: prints 0
9510019.ll: prints 0

All of them should print 1.
I tried to track which transformation was responsible for this output, but I
couldn't. Do you have any idea?

Not yet - but let's take this 1 step at a time. I think there's no question about the 1st fix, so I have committed that here:
https://reviews.llvm.org/rL353615

@rotateright
Copy link
Contributor

Ie, why is a shift better than a subtract here? I took out the fold
completely, and no regression tests fail, but I guess it's better to guard
against the bug as a 1st step.

TBH, if this causes no test regressions I'd vote for removing it entirely
straight away.

I added some tests to see if there was anything else doing the same thing. There isn't, and there is a minor potential benefit to the shift (which gets translated to add somewhere in X86ISelDAGToDAG.cpp), so I think it's ok to keep.

@rotateright
Copy link
Contributor

I added some tests to see if there was anything else doing the same thing.
There isn't, and there is a minor potential benefit to the shift (which gets
translated to add somewhere in X86ISelDAGToDAG.cpp), so I think it's ok to
keep.

That was here:
https://reviews.llvm.org/rL353619

@rotateright
Copy link
Contributor

1673800.ll: prints 0

This is the same problem from what I see. The code that was just fixed is duplicated later for the commuted case:
// X == (Z-X) --> X<<1 == Z

This whole "function" is a giant mess, so I'll see if I can reduce the code duplication.

@rotateright
Copy link
Contributor

All attached examples should be fixed after:
https://reviews.llvm.org/rL353639

Feel free to reopen if that's not correct.

Side note about fuzzing for LLVM bugs:
If you're looking for backend bugs, it might be worth trying something like this:

$ clang -O2 test.c -S -emit-llvm -Xclang -disable-llvm-optzns -o unoptimized_ir.ll
$ llc -O2 unoptimized_ir.ll -o unoptimized_ir_optimized_asm.s
$ clang unoptimized_ir_optimized_asm.s -o maybe_buggy_executable
$ clang test.c -o reference_executable
$ { compare output of } reference_executable maybe_buggy_executable

A lot of people get that 1st step wrong: you want clang to create an IR file that allows optimization by the backend, but skip intermediate optimization. That doesn't happen if you use "-O0 -emit-llvm" with clang.

@aqjune
Copy link
Contributor Author

aqjune commented Feb 11, 2019

Thank you for the advice! :)

@nathanchance
Copy link
Member

mentioned in issue llvm/llvm-bugzilla-archive#42580

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla llvm:codegen
Projects
None yet
Development

No branches or pull requests

4 participants