50389 – Missed optimization for A && !B

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 50389 - Missed optimization for A && !B

Summary: Missed optimization for A && !B

Status:	NEW

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Common Code Generator Code (show other bugs)
Version:	trunk
Hardware:	All All

Importance:	P normal
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2021-05-18 09:43 PDT by Martijn Vels
Modified:	2021-06-18 08:32 PDT (History)
CC List:	10 users (show)

See Also:
Fixed By Commit(s):

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Martijn Vels 2021-05-18 09:43:03 PDT

The ABI dictates that bool be strictly 0 or 1. A bool value outside of 0 or 1 is UB

The compiler can optimize `A && !B1 then as `A > B`, which say under x86 becomes the macro fused op:

AAndNotBAlt(bool, bool):
        cmp     dil, sil
        jbe     .LBB1_1

instead, the compiler generates:
https://gcc.godbolt.org/z/EWx63b4nz

AAndNotB(bool, bool):
        test    edi, edi
        je      .LBB0_2
        test    sil, sil
        jne     .LBB0_2
        jmp     True()

Comment 1 Datta Nagraj Pathikonda 2021-06-11 08:55:59 PDT

@spatel Can we do this opt as below in the instCombine pass using the match patterns:

https://alive2.llvm.org/ce/z/dytU8_

Comment 2 Datta Nagraj Pathikonda 2021-06-11 09:08:13 PDT

But, I see that instcombine itself converts icmp to XOR and AND insts. Not sure if instcombine is the correct place for this opt:

https://godbolt.org/z/xaPdeK61d

Comment 3 Sanjay Patel 2021-06-13 06:28:41 PDT

As noted, InstCombine currently transforms icmp of bools to logic ops. 

I don't know the history of that decision, but it seems like a reasonable choice for IR because it is more natural to analyze boolean logic rather than comparisons of bools. This is despite the fact that the logic sequence may be 2 instructions instead of 1 compare.

If this is better inverted in x86 asm, then you have backend options:
1. Show that the icmp form is better for all targets and add a transform to DAGCombiner.
2. Show that the icmp form is better for >1 targets and add a transform using a TLI hook in DAGCombiner.
3. Show that the transform is always good for x86 and add a target-specific transform in X86ISelLowering.

There may be some complications here to deal with branches on merged conditions (see the TLI hook isJumpExpensive()).

The godbolt link in the description shows a different possible missing InstCombine transform:
https://alive2.llvm.org/ce/z/gsT3bD

...or that might be 2 different transforms. Be aware of poison/undef behavior to make sure that is preserved.