Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditional branches are exchanged unoptimally #48680

Closed
weiguozhi opened this issue Feb 24, 2021 · 12 comments
Closed

Conditional branches are exchanged unoptimally #48680

weiguozhi opened this issue Feb 24, 2021 · 12 comments
Labels
backend:X86 bugzilla Issues migrated from bugzilla

Comments

@weiguozhi
Copy link
Contributor

Bugzilla Link 49336
Resolution FIXED
Resolved on Apr 14, 2021 06:44
Version trunk
OS Linux
CC @topperc,@LebedevRI,@RKSimon,@phoebewang,@rotateright
Fixed by commit(s) 661cc71

Extended Description

The following code is extracted from zippy. Compile it with -O2

#include
#define PREDICT_FALSE(x) __builtin_expect((x), 0)

volatile int g;

std::pair<char*, int>
foo(char* ip, char* ip_limit, int op) {
do {
char* old_ip = ip;
int tag_type = g;
ip = ip + 1;
int offset = *old_ip;
if (offset < 8) {
ip = ip + 2;
}
int delta = op - offset;
if (PREDICT_FALSE(delta < 0)) {
if (tag_type != 0) {
ip = old_ip;
break;
}
}
op += 8;
} while (ip < ip_limit);
return {ip, op};
}

LLVM generates:

_Z3fooPcS_i: # @​_Z3fooPcS_i
.cfi_startproc

%bb.0: # %entry

    movq    %rdi, %rax
    jmp     .LBB0_1
    .p2align        4, 0x90

.LBB0_3: # %cleanup
# in Loop: Header=BB0_1 Depth=1
movb %r8b, %cl
leaq (%rax,%rcx,2), %rax
addq $1, %rax
addl $8, %edx
cmpq %rsi, %rax
jae .LBB0_4
.LBB0_1: # %do.body
# =>This Inner Loop Header: Depth=1
movl g(%rip), %r9d
movsbl (%rax), %edi
xorl %ecx, %ecx
cmpl $8, %edi
setl %r8b
testl %r9d, %r9d // if (tag_type != 0)
je .LBB0_3

%bb.2: # %do.body

                                    #   in Loop: Header=BB0_1 Depth=1
    cmpl    %edi, %edx              // if (PREDICT_FALSE(delta < 0))
    jge     .LBB0_3

.LBB0_4: # %do.end
retq

The condition (tag_type != 0) is unpredictable, but (delta < 0) is predicted as false, so the unpredictable condition (tag_type != 0) is rarely executed.
But in the generated code, llvm changed the order of the two conditions, so the unpredictable condition is always executed, causes more branch mis-prediction.

@topperc
Copy link
Collaborator

topperc commented Feb 24, 2021

Was it reordered in IR or the backend?

@weiguozhi
Copy link
Contributor Author

In LLVM IR the two conditions are combined

%cmp2 = icmp slt i32 %op.addr.0, %conv
%cmp5 = icmp ne i32 %0, 0
%or.cond = and i1 %cmp5, %cmp2
br i1 %or.cond, label %do.end, label %cleanup

In instruction selection, it was changed to two conditional branches in bad order.

@topperc
Copy link
Collaborator

topperc commented Feb 24, 2021

As far as I know the backend just processes the operands of the AND in order. This is done in SelectionDAGBuilder::visitBR I think. It looks like the != check is first.

I'm guessing this was reordered by the reassociate pass in IR. Does this patch fix it https://reviews.llvm.org/D94089

@LebedevRI
Copy link
Member

I still don't agree that we should be viewing this kind of problem as if
it's root cause is commutativity of and/or binops.
For example, isn't it a problem that we've lost predictability annotations
along the way? Perhaps we should be doing something along the lines of

%z = icmp slt i32 %delta, 0
call @&#8203;llvm.assume()[ "unlikely"(%z) ]

?

@rotateright
Copy link
Contributor

I still don't agree that we should be viewing this kind of problem as if
it's root cause is commutativity of and/or binops.
For example, isn't it a problem that we've lost predictability annotations
along the way? Perhaps we should be doing something along the lines of

%z = icmp slt i32 %delta, 0
call @&#8203;llvm.assume()[ "unlikely"(%z) ]

?

That sounds right to me. This might require a pipeline adjustment as the first step: we run -simplifycfg before -lower-expect for some reason.

So we've already logically combined the conditions before turning the expect intrinsic into metadata:

%cmp = icmp slt i32 %0, 0
%conv = zext i1 %cmp to i64
%expval = call i64 @​llvm.expect.i64(i64 %conv, i64 0)
%tobool = icmp ne i64 %expval, 0
%1 = load i32, i32* %tag_type.addr, align 4
%cmp1 = icmp ne i32 %1, 0
%or.cond = and i1 %tobool, %cmp1
br i1 %or.cond, label %if.then2, label %if.end3

And if I'm seeing it correctly, -lower-expect just drops the intrinsic for that pattern. No profile metadata is created, so there's no hope that any later pass would be able to do anything special.

@LebedevRI
Copy link
Member

I still don't agree that we should be viewing this kind of problem as if
it's root cause is commutativity of and/or binops.
For example, isn't it a problem that we've lost predictability annotations
along the way? Perhaps we should be doing something along the lines of

%z = icmp slt i32 %delta, 0
call @&#8203;llvm.assume()[ "unlikely"(%z) ]

?

That sounds right to me.

This might require a pipeline adjustment as the
first step: we run -simplifycfg before -lower-expect for some reason.
Huh, that also sounds broken.

So we've already logically combined the conditions before turning the expect
intrinsic into metadata:

%cmp = icmp slt i32 %0, 0
%conv = zext i1 %cmp to i64
%expval = call i64 @​llvm.expect.i64(i64 %conv, i64 0)
%tobool = icmp ne i64 %expval, 0
%1 = load i32, i32* %tag_type.addr, align 4
%cmp1 = icmp ne i32 %1, 0
%or.cond = and i1 %tobool, %cmp1
br i1 %or.cond, label %if.then2, label %if.end3

And if I'm seeing it correctly, -lower-expect just drops the intrinsic for
that pattern. No profile metadata is created, so there's no hope that any
later pass would be able to do anything special.

@weiguozhi
Copy link
Contributor Author

Craig, the patch https://reviews.llvm.org/D94089 can fix this problem. Sanjay commented in that patch:

"If the expression order matters, then we want the programmer to indicate that explicitly with something like "__builtin_expect" or profile metadata."

It looks __builtin_expect doesn't actually work in this case.

I think Sanjai's analysis in #48680 #c5 is correct. The LowerExpect pass silently dropped the call to @​llvm.expect.i64 without any modification to the IR or metadata. So from then on we lost the expect information.

@weiguozhi
Copy link
Contributor Author

Any recommended solution?

@rotateright
Copy link
Contributor

Any recommended solution?

We have to start by making SimplifyCFG smarter about profile metadata. I'm looking at that now.

That first fix won't change anything for the example here though. We'll either need to modify the pass pipeline or limit SimplifyCFG even more.

Then, we'll have to make sure the metadata is respected through codegen.

@rotateright
Copy link
Contributor

We have to start by making SimplifyCFG smarter about profile metadata. I'm
looking at that now.

https://reviews.llvm.org/D98898

@rotateright
Copy link
Contributor

Proposal to change pass ordering:
https://reviews.llvm.org/D100213

@rotateright
Copy link
Contributor

This example appears to be solved with:
https://reviews.llvm.org/rG661cc71a1c50

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

4 participants