Missed optimization for ((X ashr C1) & C2) == 0 #47984

davidbolvansky · 2020-12-31T14:44:34Z


Bugzilla Link	48640
Resolution	FIXED
Resolved on	Oct 18, 2021 06:06
Version	trunk
OS	Linux
CC	@RKSimon,@rotateright
Fixed by commit(s)	`288f3fc`

Extended Description

void foo();

void bad1(int e)
{
if (((e >> 31) & 64) == 0) foo();
}

void bad2(int e)
{
if (((e >> 31) & 64) != 0) foo();
}

void ok1(unsigned e)
{
if (((e >> 31) & 64) == 0) foo();
}

void ok2(unsigned e)
{
if (((e >> 31) & 64) != 0) foo();
}

LLVM:
bad1(int): # @bad1(int)
shr edi, 25
test dil, 64
jne .LBB0_1
jmp foo() # TAILCALL
.LBB0_1:
ret
bad2(int): # @bad2(int)
shr edi, 25
test dil, 64
jne .LBB1_2
ret
.LBB1_2:
jmp foo() # TAILCALL
ok1(unsigned int): # @ok1(unsigned int)
jmp foo() # TAILCALL
ok2(unsigned int): # @ok2(unsigned int)
ret

GCC:
bad1(int):
test edi, edi
jns .L4
ret
.L4:
jmp foo()
bad2(int):
test edi, edi
js .L7
ret
.L7:
jmp foo()
ok1(unsigned int):
jmp foo()
ok2(unsigned int):
ret

https://godbolt.org/z/x85bf6

So..
((X ashr C1) & C2) != 0 to X < 0

..and
((X ashr C1) & C2) == 0 to X >= 0?

define i1 @src(i32 %0) {
%1:
%2 = ashr i32 %0, 31
%3 = and i32 %2, 64
%4 = icmp eq i32 %3, 0
ret i1 %4
}
=>
define i1 @tgt(i32 %0) {
%1:
%r = icmp sge i32 %0, 0
ret i1 %r
}
Transformation seems to be correct!

define i1 @src(i32 %0) {
%1:
%2 = ashr i32 %0, 31
%3 = and i32 %2, 64
%4 = icmp ne i32 %3, 0
ret i1 %4
}
=>
define i1 @tgt(i32 %0) {
%1:
%r = icmp slt i32 %0, 0
ret i1 %r
}
Transformation seems to be correct!

The text was updated successfully, but these errors were encountered:

rotateright · 2021-01-01T18:12:39Z

The interesting part of this one is how to generalize the pre-conditions on the constants. I think what we're asking is: "Is the mask constant only keeping bits that are guaranteed to be the same as the sign bit?"

There might be a better way to write that, but I came up with this:
https://rise4fun.com/Alive/Z0UF

Name: eq
Pre: C2 != 0 && (~(-1 u>> C1) & C2) == C2
%s = ashr i32 %x, C1
%a = and i32 %s, C2
%r = icmp eq i32 %a, 0
=>
%r = icmp sge i32 %x, 0

davidbolvansky · 2021-01-01T19:32:19Z

gcc supports this optimization with C1 as a valid shift count and C2 power of two.

“If (C2 << C1) doesn't overflow, then ((X >> C1) & C2) != 0
can be rewritten as (X & (C2 << C1)) != 0.”

Else (signed case): see above.

And it seems LLVM handles the case with no overflow.

rotateright · 2021-01-04T18:26:42Z

We miss even more basic patterns (without the 'and' mask):
https://reviews.llvm.org/D94014

RKSimon · 2021-10-17T15:11:58Z

We miss even more basic patterns (without the 'and' mask):
https://reviews.llvm.org/D94014

This part landed at rG288f3fc5dfee

Not sure if there were any specific followups but the code in trunk looks to be fixed.

@spatel - OK to resolve? (and do you recall if any other patches were necessary?)

rotateright · 2021-10-18T13:06:57Z

From the debug spew of instcombine, we now canonicalize the ashr+and to icmp+select, so that's the very recent:
https://reviews.llvm.org/D111410

We might've have managed to reduce it through some other combination of folds before last week, but either way, the examples here are now fixed.

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missed optimization for ((X ashr C1) & C2) == 0 #47984

Missed optimization for ((X ashr C1) & C2) == 0 #47984

davidbolvansky commented Dec 31, 2020

rotateright commented Jan 1, 2021

davidbolvansky commented Jan 1, 2021

rotateright commented Jan 4, 2021

RKSimon commented Oct 17, 2021

rotateright commented Oct 18, 2021

Missed optimization for ((X ashr C1) & C2) == 0 #47984

Missed optimization for ((X ashr C1) & C2) == 0 #47984

Comments

davidbolvansky commented Dec 31, 2020

Extended Description

rotateright commented Jan 1, 2021

davidbolvansky commented Jan 1, 2021

rotateright commented Jan 4, 2021

RKSimon commented Oct 17, 2021

rotateright commented Oct 18, 2021