The following code is extracted from zippy. Compile it with -O2 #include <utility> #define PREDICT_FALSE(x) __builtin_expect((x), 0) volatile int g; std::pair<char*, int> foo(char* ip, char* ip_limit, int op) { do { char* old_ip = ip; int tag_type = g; ip = ip + 1; int offset = *old_ip; if (offset < 8) { ip = ip + 2; } int delta = op - offset; if (PREDICT_FALSE(delta < 0)) { if (tag_type != 0) { ip = old_ip; break; } } op += 8; } while (ip < ip_limit); return {ip, op}; } LLVM generates: _Z3fooPcS_i: # @_Z3fooPcS_i .cfi_startproc # %bb.0: # %entry movq %rdi, %rax jmp .LBB0_1 .p2align 4, 0x90 .LBB0_3: # %cleanup # in Loop: Header=BB0_1 Depth=1 movb %r8b, %cl leaq (%rax,%rcx,2), %rax addq $1, %rax addl $8, %edx cmpq %rsi, %rax jae .LBB0_4 .LBB0_1: # %do.body # =>This Inner Loop Header: Depth=1 movl g(%rip), %r9d movsbl (%rax), %edi xorl %ecx, %ecx cmpl $8, %edi setl %r8b testl %r9d, %r9d // if (tag_type != 0) je .LBB0_3 # %bb.2: # %do.body # in Loop: Header=BB0_1 Depth=1 cmpl %edi, %edx // if (PREDICT_FALSE(delta < 0)) jge .LBB0_3 .LBB0_4: # %do.end retq The condition (tag_type != 0) is unpredictable, but (delta < 0) is predicted as false, so the unpredictable condition (tag_type != 0) is rarely executed. But in the generated code, llvm changed the order of the two conditions, so the unpredictable condition is always executed, causes more branch mis-prediction.
Was it reordered in IR or the backend?
In LLVM IR the two conditions are combined %cmp2 = icmp slt i32 %op.addr.0, %conv %cmp5 = icmp ne i32 %0, 0 %or.cond = and i1 %cmp5, %cmp2 br i1 %or.cond, label %do.end, label %cleanup In instruction selection, it was changed to two conditional branches in bad order.
As far as I know the backend just processes the operands of the AND in order. This is done in SelectionDAGBuilder::visitBR I think. It looks like the != check is first. I'm guessing this was reordered by the reassociate pass in IR. Does this patch fix it https://reviews.llvm.org/D94089
I still don't agree that we should be viewing this kind of problem as if it's root cause is commutativity of `and`/`or` binops. For example, isn't it a problem that we've lost predictability annotations along the way? Perhaps we should be doing something along the lines of ``` %z = icmp slt i32 %delta, 0 call @llvm.assume()[ "unlikely"(%z) ] ``` ?
(In reply to Roman Lebedev from comment #4) > I still don't agree that we should be viewing this kind of problem as if > it's root cause is commutativity of `and`/`or` binops. > For example, isn't it a problem that we've lost predictability annotations > along the way? Perhaps we should be doing something along the lines of > ``` > %z = icmp slt i32 %delta, 0 > call @llvm.assume()[ "unlikely"(%z) ] > ``` > ? That sounds right to me. This might require a pipeline adjustment as the first step: we run -simplifycfg before -lower-expect for some reason. So we've already logically combined the conditions before turning the expect intrinsic into metadata: %cmp = icmp slt i32 %0, 0 %conv = zext i1 %cmp to i64 %expval = call i64 @llvm.expect.i64(i64 %conv, i64 0) %tobool = icmp ne i64 %expval, 0 %1 = load i32, i32* %tag_type.addr, align 4 %cmp1 = icmp ne i32 %1, 0 %or.cond = and i1 %tobool, %cmp1 br i1 %or.cond, label %if.then2, label %if.end3 And if I'm seeing it correctly, -lower-expect just drops the intrinsic for that pattern. No profile metadata is created, so there's no hope that any later pass would be able to do anything special.
(In reply to Sanjay Patel from comment #5) > (In reply to Roman Lebedev from comment #4) > > I still don't agree that we should be viewing this kind of problem as if > > it's root cause is commutativity of `and`/`or` binops. > > For example, isn't it a problem that we've lost predictability annotations > > along the way? Perhaps we should be doing something along the lines of > > ``` > > %z = icmp slt i32 %delta, 0 > > call @llvm.assume()[ "unlikely"(%z) ] > > ``` > > ? > > That sounds right to me. > This might require a pipeline adjustment as the > first step: we run -simplifycfg before -lower-expect for some reason. Huh, that also sounds broken. > So we've already logically combined the conditions before turning the expect > intrinsic into metadata: > > %cmp = icmp slt i32 %0, 0 > %conv = zext i1 %cmp to i64 > %expval = call i64 @llvm.expect.i64(i64 %conv, i64 0) > %tobool = icmp ne i64 %expval, 0 > %1 = load i32, i32* %tag_type.addr, align 4 > %cmp1 = icmp ne i32 %1, 0 > %or.cond = and i1 %tobool, %cmp1 > br i1 %or.cond, label %if.then2, label %if.end3 > > And if I'm seeing it correctly, -lower-expect just drops the intrinsic for > that pattern. No profile metadata is created, so there's no hope that any > later pass would be able to do anything special.
Craig, the patch https://reviews.llvm.org/D94089 can fix this problem. Sanjay commented in that patch: "If the expression order matters, then we want the programmer to indicate that explicitly with something like "__builtin_expect" or profile metadata." It looks __builtin_expect doesn't actually work in this case. I think Sanjai's analysis in https://bugs.llvm.org/show_bug.cgi?id=49336#c5 is correct. The LowerExpect pass silently dropped the call to @llvm.expect.i64 without any modification to the IR or metadata. So from then on we lost the expect information.
Any recommended solution?
(In reply to Carrot from comment #8) > Any recommended solution? We have to start by making SimplifyCFG smarter about profile metadata. I'm looking at that now. That first fix won't change anything for the example here though. We'll either need to modify the pass pipeline or limit SimplifyCFG even more. Then, we'll have to make sure the metadata is respected through codegen.
(In reply to Sanjay Patel from comment #9) > We have to start by making SimplifyCFG smarter about profile metadata. I'm > looking at that now. https://reviews.llvm.org/D98898
Proposal to change pass ordering: https://reviews.llvm.org/D100213
This example appears to be solved with: https://reviews.llvm.org/rG661cc71a1c50