49336 – Conditional branches are exchanged unoptimally

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 49336 - Conditional branches are exchanged unoptimally

Summary: Conditional branches are exchanged unoptimally

Status:	RESOLVED FIXED

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Backend: X86 (show other bugs)
Version:	trunk
Hardware:	PC Linux

Importance:	P enhancement
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2021-02-23 17:47 PST by Carrot
Modified:	2021-04-14 06:44 PDT (History)
CC List:	6 users (show)

See Also:
Fixed By Commit(s):	661cc71a1c50

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Carrot 2021-02-23 17:47:54 PST

The following code is extracted from zippy. Compile it with -O2

#include <utility>
#define PREDICT_FALSE(x) __builtin_expect((x), 0)

volatile int g;

std::pair<char*, int>
foo(char* ip, char* ip_limit, int op) {
    do {
      char* old_ip = ip; 
      int tag_type = g;
      ip = ip + 1;
      int offset = *old_ip;
      if (offset < 8) {
          ip = ip + 2;
      }   
      int delta = op - offset;
      if (PREDICT_FALSE(delta < 0)) {     
        if (tag_type != 0) {             
            ip = old_ip;
            break;
        }
      }   
      op += 8;
    } while (ip < ip_limit);
  return {ip, op};
}

LLVM generates:

_Z3fooPcS_i:                            # @_Z3fooPcS_i
        .cfi_startproc
# %bb.0:                                # %entry
        movq    %rdi, %rax
        jmp     .LBB0_1
        .p2align        4, 0x90
.LBB0_3:                                # %cleanup
                                        #   in Loop: Header=BB0_1 Depth=1
        movb    %r8b, %cl 
        leaq    (%rax,%rcx,2), %rax
        addq    $1, %rax
        addl    $8, %edx
        cmpq    %rsi, %rax
        jae     .LBB0_4
.LBB0_1:                                # %do.body
                                        # =>This Inner Loop Header: Depth=1
        movl    g(%rip), %r9d
        movsbl  (%rax), %edi
        xorl    %ecx, %ecx
        cmpl    $8, %edi
        setl    %r8b
        testl   %r9d, %r9d              // if (tag_type != 0)
        je      .LBB0_3
# %bb.2:                                # %do.body
                                        #   in Loop: Header=BB0_1 Depth=1
        cmpl    %edi, %edx              // if (PREDICT_FALSE(delta < 0))
        jge     .LBB0_3
.LBB0_4:                                # %do.end
        retq

The condition (tag_type != 0) is unpredictable, but (delta < 0) is predicted as false, so the unpredictable condition (tag_type != 0) is rarely executed.
But in the generated code, llvm changed the order of the two conditions, so the unpredictable condition is always executed, causes more branch mis-prediction.

Comment 1 Craig Topper 2021-02-23 18:07:12 PST

Was it reordered in IR or the backend?

Comment 2 Carrot 2021-02-23 18:32:07 PST

In LLVM IR the two conditions are combined

  %cmp2 = icmp slt i32 %op.addr.0, %conv 
  %cmp5 = icmp ne i32 %0, 0 
  %or.cond = and i1 %cmp5, %cmp2 
  br i1 %or.cond, label %do.end, label %cleanup

In instruction selection, it was changed to two conditional branches in bad order.

Comment 3 Craig Topper 2021-02-23 18:53:41 PST

As far as I know the backend just processes the operands of the AND in order. This is done in SelectionDAGBuilder::visitBR I think. It looks like the != check is first.

I'm guessing this was reordered by the reassociate pass in IR. Does this patch fix it https://reviews.llvm.org/D94089

Comment 4 Roman Lebedev 2021-02-24 08:08:21 PST

I still don't agree that we should be viewing this kind of problem as if
it's root cause is commutativity of `and`/`or` binops.
For example, isn't it a problem that we've lost predictability annotations
along the way? Perhaps we should be doing something along the lines of
```
%z = icmp slt i32 %delta, 0
call @llvm.assume()[ "unlikely"(%z) ]
```
?

Comment 5 Sanjay Patel 2021-02-24 09:02:03 PST

(In reply to Roman Lebedev from comment #4)
> I still don't agree that we should be viewing this kind of problem as if
> it's root cause is commutativity of `and`/`or` binops.
> For example, isn't it a problem that we've lost predictability annotations
> along the way? Perhaps we should be doing something along the lines of
> ```
> %z = icmp slt i32 %delta, 0
> call @llvm.assume()[ "unlikely"(%z) ]
> ```
> ?

That sounds right to me. This might require a pipeline adjustment as the first step: we run -simplifycfg before -lower-expect for some reason. 

So we've already logically combined the conditions before turning the expect intrinsic into metadata:

  %cmp = icmp slt i32 %0, 0
  %conv = zext i1 %cmp to i64
  %expval = call i64 @llvm.expect.i64(i64 %conv, i64 0)
  %tobool = icmp ne i64 %expval, 0
  %1 = load i32, i32* %tag_type.addr, align 4
  %cmp1 = icmp ne i32 %1, 0
  %or.cond = and i1 %tobool, %cmp1
  br i1 %or.cond, label %if.then2, label %if.end3

And if I'm seeing it correctly, -lower-expect just drops the intrinsic for that pattern. No profile metadata is created, so there's no hope that any later pass would be able to do anything special.

Comment 6 Roman Lebedev 2021-02-24 09:06:35 PST

(In reply to Sanjay Patel from comment #5)
> (In reply to Roman Lebedev from comment #4)
> > I still don't agree that we should be viewing this kind of problem as if
> > it's root cause is commutativity of `and`/`or` binops.
> > For example, isn't it a problem that we've lost predictability annotations
> > along the way? Perhaps we should be doing something along the lines of
> > ```
> > %z = icmp slt i32 %delta, 0
> > call @llvm.assume()[ "unlikely"(%z) ]
> > ```
> > ?
> 
> That sounds right to me.

> This might require a pipeline adjustment as the
> first step: we run -simplifycfg before -lower-expect for some reason. 
Huh, that also sounds broken.

> So we've already logically combined the conditions before turning the expect
> intrinsic into metadata:
> 
>   %cmp = icmp slt i32 %0, 0
>   %conv = zext i1 %cmp to i64
>   %expval = call i64 @llvm.expect.i64(i64 %conv, i64 0)
>   %tobool = icmp ne i64 %expval, 0
>   %1 = load i32, i32* %tag_type.addr, align 4
>   %cmp1 = icmp ne i32 %1, 0
>   %or.cond = and i1 %tobool, %cmp1
>   br i1 %or.cond, label %if.then2, label %if.end3
> 
> And if I'm seeing it correctly, -lower-expect just drops the intrinsic for
> that pattern. No profile metadata is created, so there's no hope that any
> later pass would be able to do anything special.

Comment 7 Carrot 2021-02-24 16:41:16 PST

Craig, the patch https://reviews.llvm.org/D94089 can fix this problem. Sanjay commented in that patch:

"If the expression order matters, then we want the programmer to indicate that explicitly with something like "__builtin_expect" or profile metadata."

It looks __builtin_expect doesn't actually work in this case.

I think Sanjai's analysis in https://bugs.llvm.org/show_bug.cgi?id=49336#c5 is correct. The LowerExpect pass silently dropped the call to @llvm.expect.i64 without any modification to the IR or metadata. So from then on we lost the expect information.

Comment 8 Carrot 2021-03-05 10:02:03 PST

Any recommended solution?

Comment 9 Sanjay Patel 2021-03-18 07:11:25 PDT

(In reply to Carrot from comment #8)
> Any recommended solution?

We have to start by making SimplifyCFG smarter about profile metadata. I'm looking at that now.

That first fix won't change anything for the example here though. We'll either need to modify the pass pipeline or limit SimplifyCFG even more. 

Then, we'll have to make sure the metadata is respected through codegen.

Comment 10 Sanjay Patel 2021-03-18 14:32:37 PDT

(In reply to Sanjay Patel from comment #9)
> We have to start by making SimplifyCFG smarter about profile metadata. I'm
> looking at that now.

https://reviews.llvm.org/D98898

Comment 11 Sanjay Patel 2021-04-09 09:41:11 PDT

Proposal to change pass ordering:
https://reviews.llvm.org/D100213

Comment 12 Sanjay Patel 2021-04-14 06:44:09 PDT

This example appears to be solved with:
https://reviews.llvm.org/rG661cc71a1c50