48353 – [InstCombine] Incorrect folding of LShr into select instruction

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 48353 - [InstCombine] Incorrect folding of LShr into select instruction

Summary: [InstCombine] Incorrect folding of LShr into select instruction

Status:	NEW

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Scalar Optimizations (show other bugs)
Version:	trunk
Hardware:	PC Linux

Importance:	P enhancement
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Duplicates (1):	48435 (view as bug list)
Depends on:
Blocks:

Reported:	2020-12-01 14:16 PST by Congzhe Cao
Modified:	2020-12-22 12:07 PST (History)
CC List:	8 users (show)

See Also:	48435
Fixed By Commit(s):

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Congzhe Cao 2020-12-01 14:16:31 PST

The IR that exposes this bug is as follows. 

; ModuleID = 'test.c'

@m = common dso_local local_unnamed_addr global i32 0, align 4

define dso_local i32 @test() local_unnamed_addr #1 {
entry:
  call fastcc void @foo(i32 51)
  ret i32 0
}

define internal fastcc void @foo(i32 %aj) {
entry:
  %cmp.i = icmp sgt i32 %aj, 1
  %shr.i = select i1 %cmp.i, i32 0, i32 %aj
  %cond.i1 = lshr i32 3, %shr.i
  %conv.i = trunc i32 %cond.i1 to i8
  %0 = and i8 %conv.i, 1
  %cmp4 = icmp ugt i8 %0, 0
  %conv5 = zext i1 %cmp4 to i32
  store i32 %conv5, i32* @m
  ret void
}

Note that at compile time we know the argument of foo() is %aj = 51 thus "%shr.i = select i1 %cmp.i, i32 0, i32 %aj" will result in 0 and hence LShr instruction that follows is correct, not resulting in a poison value.

When running "opt -instcombine test.ll -S", the following IR is generated. Specifically, "%aj.op = lshr i32 3, %aj" is generated which result in a poison value since %aj is actually greater than 32.

@m = common dso_local local_unnamed_addr global i32 0, align 4

define dso_local i32 @test() local_unnamed_addr #1 {
entry:
  call fastcc void @foo(i32 51)
  ret i32 0
}

define internal fastcc void @o(i32 %aj) {
entry:
  %cmp.i = icmp sgt i32 %aj, 1
  %aj.op = lshr i32 3, %aj
  %.op2 = and i32 %aj.op, 1
  %cmp.i3 = zext i1 %cmp.i to i32
  %conv55 = or i32 %.op2, %cmp.i3
  store i32 %conv55, i32* @m, align 4
  ret void
}

************************

The cause is that during InstCombine, we fold 

%shr.i = select i1 %cmp.i, i32 0, i32 %aj
%cond.i1 = lshr i32 3, %shr.i

into 

%aj.op = lshr i32 3, %aj       
%cond.i1 = select i1 %cmp.i, i32 3, i32 %aj.op,

thus generating the incorrect LShr instruction "%aj.op = lshr i32 3, %aj".

****************************

Our proposed fix:

In function InstCombiner::FoldOpIntoSelect() from Transforms/InstCombine/InstructionCombining.cpp, add the following check such that we only fold LShr into select when both the true value and the false value are known to be constants. 

  // If op is a LShr instruction, we can only fold into select when both TV
  // and FV are known to be constants, otherwise they may be evaluated to be
  // greater than the width of the first operand of LShr, resulting in for
  // example:
  //     LShr i32 X Y where Y > 32,
  // which is not suppose to be generated.
  if (Op.getOpcode() == Instruction::LShr &&
      !(isa<Constant>(TV) && isa<Constant>(FV))) {
    return nullptr;
  }

Comment 1 Roman Lebedev 2020-12-02 02:27:55 PST

So the transform that is causing miscompile is 

----------------------------------------
define i32 @src(i32 %aj) {
%entry:
  %cmp.i = icmp sgt i32 %aj, 1
  %aj.op = lshr i32 3, %aj
  %.op2 = and i32 %aj.op, 1
  %cmp41 = icmp ne i32 %.op2, 0
  %cmp4 = select i1 %cmp.i, i1 1, i1 %cmp41
  %conv5 = zext i1 %cmp4 to i32
  ret i32 %conv5
}
=>
define i32 @tgt(i32 %aj) {
%entry:
  %cmp.i = icmp sgt i32 %aj, 1
  %aj.op = lshr i32 3, %aj
  %.op2 = and i32 %aj.op, 1
  %cmp41 = icmp ne i32 %.op2, 0
  %cmp4 = or i1 %cmp.i, %cmp41
  %conv5 = zext i1 %cmp4 to i32
  ret i32 %conv5
}
Transformation doesn't verify!
ERROR: Target is more poisonous than source

Example:
i32 %aj = #x00000020 (32)

Source:
i1 %cmp.i = #x1 (1)
i32 %aj.op = poison
i32 %.op2 = poison
i1 %cmp41 = poison
i1 %cmp4 = #x1 (1)
i32 %conv5 = #x00000001 (1)

Target:
i1 %cmp.i = #x1 (1)
i32 %aj.op = poison
i32 %.op2 = poison
i1 %cmp41 = poison
i1 %cmp4 = poison
i32 %conv5 = poison
Source value: #x00000001 (1)
Target value: poison


I.e.  

----------------------------------------
define i1 @src(i1 %cmp.i, i1 %cmp41) {
%entry:
  %cmp4 = select i1 %cmp.i, i1 1, i1 %cmp41
  ret i1 %cmp4
}
=>
define i1 @tgt(i1 %cmp.i, i1 %cmp41) {
%entry:
  %cmp4 = or i1 %cmp.i, %cmp41
  ret i1 %cmp4
}
Transformation doesn't verify!
ERROR: Target is more poisonous than source

Example:
i1 %cmp.i = #x1 (1)
i1 %cmp41 = poison

Source:
i1 %cmp4 = #x1 (1)

Target:
i1 %cmp4 = poison
Source value: #x1 (1)
Target value: poison

Comment 2 Roman Lebedev 2020-12-02 03:14:40 PST

... time to apply freeze?

Comment 3 Juneyoung Lee 2020-12-02 04:25:53 PST

Would it be a big change if optimizations that deal with or/and are updated to deal with the corresponding select patterns?

Comment 4 Nuno Lopes 2020-12-02 04:29:00 PST

I tend to agree. I don't see much value in turning select -> and/or + freeze. I would rather keep the select and make sure the backends are happy with it.
I think Sanjay already did some work in SelDAG to make it like selects more AFAIR.

Comment 5 Sanjay Patel 2020-12-02 05:48:04 PST

Yes, we've done some work in codegen to make it more robust with select in IR. We've also tried to patch instcombine to make it more flexible about recognizing patterns with selects.

If I'm seeing it correctly, this is one of the transforms noted here:
https://reviews.llvm.org/D72396

So we need to remove this block:
https://github.com/llvm/llvm-project/blob/f03c21df7b845e2ffcef42f242764f36603fdbb4/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp#L2627

And see what regresses?

Comment 6 Nuno Lopes 2020-12-02 06:13:50 PST

> So we need to remove this block:
> https://github.com/llvm/llvm-project/blob/
> f03c21df7b845e2ffcef42f242764f36603fdbb4/llvm/lib/Transforms/InstCombine/
> InstCombineSelect.cpp#L2627
> 
> And see what regresses?

Sounds like a plan! :)
(that whole block seems incorrect, yes)

Comment 7 Congzhe Cao 2020-12-02 14:19:23 PST

(In reply to Sanjay Patel from comment #5)
> Yes, we've done some work in codegen to make it more robust with select in
> IR. We've also tried to patch instcombine to make it more flexible about
> recognizing patterns with selects.
> 
> If I'm seeing it correctly, this is one of the transforms noted here:
> https://reviews.llvm.org/D72396
> 
> So we need to remove this block:
> https://github.com/llvm/llvm-project/blob/
> f03c21df7b845e2ffcef42f242764f36603fdbb4/llvm/lib/Transforms/InstCombine/
> InstCombineSelect.cpp#L2627
> 
> And see what regresses?

Thanks for the suggestion! (In reply to Sanjay Patel from comment #5)
> Yes, we've done some work in codegen to make it more robust with select in
> IR. We've also tried to patch instcombine to make it more flexible about
> recognizing patterns with selects.
> 
> If I'm seeing it correctly, this is one of the transforms noted here:
> https://reviews.llvm.org/D72396
> 
> So we need to remove this block:
> https://github.com/llvm/llvm-project/blob/
> f03c21df7b845e2ffcef42f242764f36603fdbb4/llvm/lib/Transforms/InstCombine/
> InstCombineSelect.cpp#L2627
> 
> And see what regresses?

Thanks for the suggestion! Are you going to upstream it? We are going to internally implement a patch as suggested and measure the performance anyways, so we could just post the bug/solution/result to Phabricator if that makes things easier. I'd appreciate your comment on it.

Comment 8 Congzhe Cao 2020-12-02 21:40:08 PST

(In reply to Nuno Lopes from comment #4)
> I tend to agree. I don't see much value in turning select -> and/or +
> freeze. I would rather keep the select and make sure the backends are happy
> with it.
> I think Sanjay already did some work in SelDAG to make it like selects more
> AFAIR.

Thanks for the comment. I'm just curious what would be the potential solution of "turning select -> and/or + freeze", what does it look like? Is it like just freezing the result of and/or instruction?

Comment 9 Juneyoung Lee 2020-12-02 21:56:52 PST

The solution that uses freeze is to insert freeze at the operand or and/or coming from non-conditional value of select:

select x, true, y -> or x, freeze(y)
select x, y, false -> and x, freeze(y)

Comment 10 Juneyoung Lee 2020-12-02 22:23:48 PST

Here is the test: https://alive2.llvm.org/ce/z/dwFA_n

Comment 11 Congzhe Cao 2020-12-03 06:31:29 PST

(In reply to Juneyoung Lee from comment #10)
> Here is the test: https://alive2.llvm.org/ce/z/dwFA_n

Thanks for the demo! it makes perfect sense.

Comment 12 Sanjay Patel 2020-12-08 09:10:57 PST

(In reply to Congzhe Cao from comment #7)
> (In reply to Sanjay Patel from comment #5)
> > Yes, we've done some work in codegen to make it more robust with select in
> > IR. We've also tried to patch instcombine to make it more flexible about
> > recognizing patterns with selects.
> > 
> > If I'm seeing it correctly, this is one of the transforms noted here:
> > https://reviews.llvm.org/D72396
> > 
> > So we need to remove this block:
> > https://github.com/llvm/llvm-project/blob/
> > f03c21df7b845e2ffcef42f242764f36603fdbb4/llvm/lib/Transforms/InstCombine/
> > InstCombineSelect.cpp#L2627
> > 
> > And see what regresses?
> 
> Thanks for the suggestion! (In reply to Sanjay Patel from comment #5)
> > Yes, we've done some work in codegen to make it more robust with select in
> > IR. We've also tried to patch instcombine to make it more flexible about
> > recognizing patterns with selects.
> > 
> > If I'm seeing it correctly, this is one of the transforms noted here:
> > https://reviews.llvm.org/D72396
> > 
> > So we need to remove this block:
> > https://github.com/llvm/llvm-project/blob/
> > f03c21df7b845e2ffcef42f242764f36603fdbb4/llvm/lib/Transforms/InstCombine/
> > InstCombineSelect.cpp#L2627
> > 
> > And see what regresses?
> 
> Thanks for the suggestion! Are you going to upstream it? We are going to
> internally implement a patch as suggested and measure the performance
> anyways, so we could just post the bug/solution/result to Phabricator if
> that makes things easier. I'd appreciate your comment on it.

Sorry, I missed that comment initially. If you can collect some perf measurements with that change and/or post to Phab, that would be great. 

I just looked at some of the instcombine regression test diffs, and we may need to accelerate the canonicalization to the min/max intrinsics to avoid problems. But we should check codegen for various targets to see if those are really regressions that we must address.

Comment 13 Congzhe Cao 2020-12-10 12:22:58 PST

(In reply to Sanjay Patel from comment #12)
> (In reply to Congzhe Cao from comment #7)
> > (In reply to Sanjay Patel from comment #5)
> > > Yes, we've done some work in codegen to make it more robust with select in
> > > IR. We've also tried to patch instcombine to make it more flexible about
> > > recognizing patterns with selects.
> > > 
> > > If I'm seeing it correctly, this is one of the transforms noted here:
> > > https://reviews.llvm.org/D72396
> > > 
> > > So we need to remove this block:
> > > https://github.com/llvm/llvm-project/blob/
> > > f03c21df7b845e2ffcef42f242764f36603fdbb4/llvm/lib/Transforms/InstCombine/
> > > InstCombineSelect.cpp#L2627
> > > 
> > > And see what regresses?
> > 
> > Thanks for the suggestion! (In reply to Sanjay Patel from comment #5)
> > > Yes, we've done some work in codegen to make it more robust with select in
> > > IR. We've also tried to patch instcombine to make it more flexible about
> > > recognizing patterns with selects.
> > > 
> > > If I'm seeing it correctly, this is one of the transforms noted here:
> > > https://reviews.llvm.org/D72396
> > > 
> > > So we need to remove this block:
> > > https://github.com/llvm/llvm-project/blob/
> > > f03c21df7b845e2ffcef42f242764f36603fdbb4/llvm/lib/Transforms/InstCombine/
> > > InstCombineSelect.cpp#L2627
> > > 
> > > And see what regresses?
> > 
> > Thanks for the suggestion! Are you going to upstream it? We are going to
> > internally implement a patch as suggested and measure the performance
> > anyways, so we could just post the bug/solution/result to Phabricator if
> > that makes things easier. I'd appreciate your comment on it.
> 
> Sorry, I missed that comment initially. If you can collect some perf
> measurements with that change and/or post to Phab, that would be great. 
> 
> I just looked at some of the instcombine regression test diffs, and we may
> need to accelerate the canonicalization to the min/max intrinsics to avoid
> problems. But we should check codegen for various targets to see if those
> are really regressions that we must address.

Thanks for the reply! I followed the suggestion that removes this block:
https://github.com/llvm/llvm-project/blob/f03c21df7b845e2ffcef42f242764f36603fdbb4/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp#L2627.

But then the built clang crashes when building some tests. The cause is that, since that block has been removed, when we now process instructions like "select i1 Condition, i1 TrueValue, i1 FalseValue", we'll enter the following code block to optimize it:
https://github.com/llvm/llvm-project/blob/f03c21df7b845e2ffcef42f242764f36603fdbb4/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp#L2669

And this results in "zext i1 to i1", causing the compiler to crash.

My fix to the crash is that, in the condition checking (line 2669), adding the following checks to make sure TrueVal and FalseVal are not i1 type, which now looks like:

 if (SelType->isIntOrIntVectorTy() &&
      CondVal->getType()->isVectorTy() == SelType->isVectorTy() &&
      (TrueVal->getType()->isIntegerTy() &&
       TrueVal->getType()->getIntegerBitWidth() != 1) &&
      (FalseVal->getType()->isIntegerTy() &&
       FalseVal->getType()->getIntegerBitWidth() != 1))

I'm wondering if you have comments on this fix? If it looks fine I'll go ahead measuring the performance. Thanks!

Comment 14 Congzhe Cao 2020-12-10 13:37:42 PST

Posted the fix suggested to Phabricator as work-in-progress, will update the progress and performance numbers there. Any review is welcome, please just let me know if I happen to miss someone.

https://reviews.llvm.org/D93065

Comment 15 Congzhe Cao 2020-12-22 12:07:02 PST

*** Bug 48435 has been marked as a duplicate of this bug. ***