Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InstCombine] Incorrect folding of LShr into select instruction #47697

Closed
CongzheUalberta opened this issue Dec 1, 2020 · 17 comments
Closed
Labels
bugzilla Issues migrated from bugzilla llvm:instcombine

Comments

@CongzheUalberta
Copy link
Contributor

Bugzilla Link 48353
Version trunk
OS Linux
CC @aqjune,@LebedevRI,@RKSimon,@meheff,@nunoplopes,@regehr,@rotateright

Extended Description

The IR that exposes this bug is as follows.

; ModuleID = 'test.c'

@​m = common dso_local local_unnamed_addr global i32 0, align 4

define dso_local i32 @​test() local_unnamed_addr #​1 {
entry:
call fastcc void @​foo(i32 51)
ret i32 0
}

define internal fastcc void @​foo(i32 %aj) {
entry:
%cmp.i = icmp sgt i32 %aj, 1
%shr.i = select i1 %cmp.i, i32 0, i32 %aj
%cond.i1 = lshr i32 3, %shr.i
%conv.i = trunc i32 %cond.i1 to i8
%0 = and i8 %conv.i, 1
%cmp4 = icmp ugt i8 %0, 0
%conv5 = zext i1 %cmp4 to i32
store i32 %conv5, i32* @​m
ret void
}

Note that at compile time we know the argument of foo() is %aj = 51 thus "%shr.i = select i1 %cmp.i, i32 0, i32 %aj" will result in 0 and hence LShr instruction that follows is correct, not resulting in a poison value.

When running "opt -instcombine test.ll -S", the following IR is generated. Specifically, "%aj.op = lshr i32 3, %aj" is generated which result in a poison value since %aj is actually greater than 32.

@​m = common dso_local local_unnamed_addr global i32 0, align 4

define dso_local i32 @​test() local_unnamed_addr #​1 {
entry:
call fastcc void @​foo(i32 51)
ret i32 0
}

define internal fastcc void @​o(i32 %aj) {
entry:
%cmp.i = icmp sgt i32 %aj, 1
%aj.op = lshr i32 3, %aj
%.op2 = and i32 %aj.op, 1
%cmp.i3 = zext i1 %cmp.i to i32
%conv55 = or i32 %.op2, %cmp.i3
store i32 %conv55, i32* @​m, align 4
ret void
}


The cause is that during InstCombine, we fold

%shr.i = select i1 %cmp.i, i32 0, i32 %aj
%cond.i1 = lshr i32 3, %shr.i

into

%aj.op = lshr i32 3, %aj
%cond.i1 = select i1 %cmp.i, i32 3, i32 %aj.op,

thus generating the incorrect LShr instruction "%aj.op = lshr i32 3, %aj".


Our proposed fix:

In function InstCombiner::FoldOpIntoSelect() from Transforms/InstCombine/InstructionCombining.cpp, add the following check such that we only fold LShr into select when both the true value and the false value are known to be constants.

// If op is a LShr instruction, we can only fold into select when both TV
// and FV are known to be constants, otherwise they may be evaluated to be
// greater than the width of the first operand of LShr, resulting in for
// example:
// LShr i32 X Y where Y > 32,
// which is not suppose to be generated.
if (Op.getOpcode() == Instruction::LShr &&
!(isa(TV) && isa(FV))) {
return nullptr;
}

@LebedevRI
Copy link
Member

So the transform that is causing miscompile is


define i32 @​src(i32 %aj) {
%entry:
%cmp.i = icmp sgt i32 %aj, 1
%aj.op = lshr i32 3, %aj
%.op2 = and i32 %aj.op, 1
%cmp41 = icmp ne i32 %.op2, 0
%cmp4 = select i1 %cmp.i, i1 1, i1 %cmp41
%conv5 = zext i1 %cmp4 to i32
ret i32 %conv5
}
=>
define i32 @​tgt(i32 %aj) {
%entry:
%cmp.i = icmp sgt i32 %aj, 1
%aj.op = lshr i32 3, %aj
%.op2 = and i32 %aj.op, 1
%cmp41 = icmp ne i32 %.op2, 0
%cmp4 = or i1 %cmp.i, %cmp41
%conv5 = zext i1 %cmp4 to i32
ret i32 %conv5
}
Transformation doesn't verify!
ERROR: Target is more poisonous than source

Example:
i32 %aj = #x00000020 (32)

Source:
i1 %cmp.i = #x1 (1)
i32 %aj.op = poison
i32 %.op2 = poison
i1 %cmp41 = poison
i1 %cmp4 = #x1 (1)
i32 %conv5 = #x00000001 (1)

Target:
i1 %cmp.i = #x1 (1)
i32 %aj.op = poison
i32 %.op2 = poison
i1 %cmp41 = poison
i1 %cmp4 = poison
i32 %conv5 = poison
Source value: #x00000001 (1)
Target value: poison

I.e.


define i1 @​src(i1 %cmp.i, i1 %cmp41) {
%entry:
%cmp4 = select i1 %cmp.i, i1 1, i1 %cmp41
ret i1 %cmp4
}
=>
define i1 @​tgt(i1 %cmp.i, i1 %cmp41) {
%entry:
%cmp4 = or i1 %cmp.i, %cmp41
ret i1 %cmp4
}
Transformation doesn't verify!
ERROR: Target is more poisonous than source

Example:
i1 %cmp.i = #x1 (1)
i1 %cmp41 = poison

Source:
i1 %cmp4 = #x1 (1)

Target:
i1 %cmp4 = poison
Source value: #x1 (1)
Target value: poison

@LebedevRI
Copy link
Member

... time to apply freeze?

@aqjune
Copy link
Contributor

aqjune commented Dec 2, 2020

Would it be a big change if optimizations that deal with or/and are updated to deal with the corresponding select patterns?

@nunoplopes
Copy link
Member

I tend to agree. I don't see much value in turning select -> and/or + freeze. I would rather keep the select and make sure the backends are happy with it.
I think Sanjay already did some work in SelDAG to make it like selects more AFAIR.

@rotateright
Copy link
Contributor

Yes, we've done some work in codegen to make it more robust with select in IR. We've also tried to patch instcombine to make it more flexible about recognizing patterns with selects.

If I'm seeing it correctly, this is one of the transforms noted here:
https://reviews.llvm.org/D72396

So we need to remove this block:

if (SelType->isIntOrIntVectorTy(1) &&

And see what regresses?

@nunoplopes
Copy link
Member

So we need to remove this block:
https://github.com/llvm/llvm-project/blob/
f03c21d/llvm/lib/Transforms/InstCombine/
InstCombineSelect.cpp#L2627

And see what regresses?

Sounds like a plan! :)
(that whole block seems incorrect, yes)

@CongzheUalberta
Copy link
Contributor Author

Yes, we've done some work in codegen to make it more robust with select in
IR. We've also tried to patch instcombine to make it more flexible about
recognizing patterns with selects.

If I'm seeing it correctly, this is one of the transforms noted here:
https://reviews.llvm.org/D72396

So we need to remove this block:
https://github.com/llvm/llvm-project/blob/
f03c21d/llvm/lib/Transforms/InstCombine/
InstCombineSelect.cpp#L2627

And see what regresses?

Thanks for the suggestion!

Yes, we've done some work in codegen to make it more robust with select in
IR. We've also tried to patch instcombine to make it more flexible about
recognizing patterns with selects.

If I'm seeing it correctly, this is one of the transforms noted here:
https://reviews.llvm.org/D72396

So we need to remove this block:
https://github.com/llvm/llvm-project/blob/
f03c21d/llvm/lib/Transforms/InstCombine/
InstCombineSelect.cpp#L2627

And see what regresses?

Thanks for the suggestion! Are you going to upstream it? We are going to internally implement a patch as suggested and measure the performance anyways, so we could just post the bug/solution/result to Phabricator if that makes things easier. I'd appreciate your comment on it.

@CongzheUalberta
Copy link
Contributor Author

I tend to agree. I don't see much value in turning select -> and/or +
freeze. I would rather keep the select and make sure the backends are happy
with it.
I think Sanjay already did some work in SelDAG to make it like selects more
AFAIR.

Thanks for the comment. I'm just curious what would be the potential solution of "turning select -> and/or + freeze", what does it look like? Is it like just freezing the result of and/or instruction?

@aqjune
Copy link
Contributor

aqjune commented Dec 3, 2020

The solution that uses freeze is to insert freeze at the operand or and/or coming from non-conditional value of select:

select x, true, y -> or x, freeze(y)
select x, y, false -> and x, freeze(y)

@aqjune
Copy link
Contributor

aqjune commented Dec 3, 2020

Here is the test: https://alive2.llvm.org/ce/z/dwFA_n

@CongzheUalberta
Copy link
Contributor Author

Here is the test: https://alive2.llvm.org/ce/z/dwFA_n

Thanks for the demo! it makes perfect sense.

@rotateright
Copy link
Contributor

Yes, we've done some work in codegen to make it more robust with select in
IR. We've also tried to patch instcombine to make it more flexible about
recognizing patterns with selects.

If I'm seeing it correctly, this is one of the transforms noted here:
https://reviews.llvm.org/D72396

So we need to remove this block:
https://github.com/llvm/llvm-project/blob/
f03c21d/llvm/lib/Transforms/InstCombine/
InstCombineSelect.cpp#L2627

And see what regresses?

Thanks for the suggestion!

Yes, we've done some work in codegen to make it more robust with select in
IR. We've also tried to patch instcombine to make it more flexible about
recognizing patterns with selects.

If I'm seeing it correctly, this is one of the transforms noted here:
https://reviews.llvm.org/D72396

So we need to remove this block:
https://github.com/llvm/llvm-project/blob/
f03c21d/llvm/lib/Transforms/InstCombine/
InstCombineSelect.cpp#L2627

And see what regresses?

Thanks for the suggestion! Are you going to upstream it? We are going to
internally implement a patch as suggested and measure the performance
anyways, so we could just post the bug/solution/result to Phabricator if
that makes things easier. I'd appreciate your comment on it.

Sorry, I missed that comment initially. If you can collect some perf measurements with that change and/or post to Phab, that would be great.

I just looked at some of the instcombine regression test diffs, and we may need to accelerate the canonicalization to the min/max intrinsics to avoid problems. But we should check codegen for various targets to see if those are really regressions that we must address.

@CongzheUalberta
Copy link
Contributor Author

Yes, we've done some work in codegen to make it more robust with select in
IR. We've also tried to patch instcombine to make it more flexible about
recognizing patterns with selects.

If I'm seeing it correctly, this is one of the transforms noted here:
https://reviews.llvm.org/D72396

So we need to remove this block:
https://github.com/llvm/llvm-project/blob/
f03c21d/llvm/lib/Transforms/InstCombine/
InstCombineSelect.cpp#L2627

And see what regresses?

Thanks for the suggestion!

Yes, we've done some work in codegen to make it more robust with select in
IR. We've also tried to patch instcombine to make it more flexible about
recognizing patterns with selects.

If I'm seeing it correctly, this is one of the transforms noted here:
https://reviews.llvm.org/D72396

So we need to remove this block:
https://github.com/llvm/llvm-project/blob/
f03c21d/llvm/lib/Transforms/InstCombine/
InstCombineSelect.cpp#L2627

And see what regresses?

Thanks for the suggestion! Are you going to upstream it? We are going to
internally implement a patch as suggested and measure the performance
anyways, so we could just post the bug/solution/result to Phabricator if
that makes things easier. I'd appreciate your comment on it.

Sorry, I missed that comment initially. If you can collect some perf
measurements with that change and/or post to Phab, that would be great.

I just looked at some of the instcombine regression test diffs, and we may
need to accelerate the canonicalization to the min/max intrinsics to avoid
problems. But we should check codegen for various targets to see if those
are really regressions that we must address.

Thanks for the reply! I followed the suggestion that removes this block:

if (SelType->isIntOrIntVectorTy(1) &&
.

But then the built clang crashes when building some tests. The cause is that, since that block has been removed, when we now process instructions like "select i1 Condition, i1 TrueValue, i1 FalseValue", we'll enter the following code block to optimize it:

if (SelType->isIntOrIntVectorTy() &&

And this results in "zext i1 to i1", causing the compiler to crash.

My fix to the crash is that, in the condition checking (line 2669), adding the following checks to make sure TrueVal and FalseVal are not i1 type, which now looks like:

if (SelType->isIntOrIntVectorTy() &&
CondVal->getType()->isVectorTy() == SelType->isVectorTy() &&
(TrueVal->getType()->isIntegerTy() &&
TrueVal->getType()->getIntegerBitWidth() != 1) &&
(FalseVal->getType()->isIntegerTy() &&
FalseVal->getType()->getIntegerBitWidth() != 1))

I'm wondering if you have comments on this fix? If it looks fine I'll go ahead measuring the performance. Thanks!

@CongzheUalberta
Copy link
Contributor Author

Posted the fix suggested to Phabricator as work-in-progress, will update the progress and performance numbers there. Any review is welcome, please just let me know if I happen to miss someone.

https://reviews.llvm.org/D93065

@CongzheUalberta
Copy link
Contributor Author

*** Bug llvm/llvm-bugzilla-archive#48435 has been marked as a duplicate of this bug. ***

@CongzheUalberta
Copy link
Contributor Author

mentioned in issue llvm/llvm-bugzilla-archive#48435

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
@nikic
Copy link
Contributor

nikic commented May 17, 2022

The logical and/or changes have since landed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla llvm:instcombine
Projects
None yet
Development

No branches or pull requests

7 participants