43198 – LLVM should better understand subtraction of `zext i1`s

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 43198 - LLVM should better understand subtraction of `zext i1`s

Summary: LLVM should better understand subtraction of `zext i1`s

Status:	NEW

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Scalar Optimizations (show other bugs)
Version:	trunk
Hardware:	PC Windows NT

Importance:	P enhancement
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2019-09-02 15:29 PDT by Scott McMurray
Modified:	2019-09-03 20:33 PDT (History)
CC List:	4 users (show)

See Also:
Fixed By Commit(s):

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Scott McMurray 2019-09-02 15:29:29 PDT

I found this trying to make a better implementation of Rust's `Ord::cmp` for integers.

C++ repro https://godbolt.org/z/tL0-oW
```
int spaceship(int a, int b) {
    return (a > b) - (a < b);
}

bool my_lt(int a, int b) {
    return spaceship(a, b) == -1;
}
```

`my_lt` there should be foldable down to just `a < b`, but that doesn't happen
```
define dso_local zeroext i1 @_Z5my_ltii(i32 %0, i32 %1) local_unnamed_addr #0 {
  %3 = icmp sgt i32 %0, %1
  %4 = zext i1 %3 to i32
  %5 = icmp slt i32 %0, %1
  %6 = zext i1 %5 to i32
  %7 = sub nsw i32 %4, %6
  %8 = icmp eq i32 %7, -1
  ret i1 %8
}
```

(And running it through `opt` again doesn't help either, https://godbolt.org/z/-hl5H0)

Comment 1 Sanjay Patel 2019-09-03 06:48:57 PDT

If we add this canonicalization to use 'select', existing transforms should find the simplification:

  %gt = icmp sgt i32 %x, %y
  %zgt = zext i1 %gt to i32
  %lt = icmp slt i32 %x, %y
  %zlt = zext i1 %lt to i32
  %d = sub nsw i32 %zgt, %zlt
=>
  %d = select i1 %lt, i32 -1, i32 %zgt

https://rise4fun.com/Alive/pQN

Comment 2 Scott McMurray 2019-09-03 20:33:22 PDT

Note that that canonicalization would undo what I was trying in the first place, which is this difference https://godbolt.org/z/to7D8q
```
example::spaceship1:
        cmp     edi, esi
        seta    al
        sbb     al, 0
        ret

example::spaceship2:
        xor     ecx, ecx
        cmp     edi, esi
        seta    cl
        mov     eax, 255
        cmovae  eax, ecx
        ret
```

It's not obvious to me which of those is better in general (is byte sbb or cmov worse?), but llvm-mca does say the former is better on the (old) core2 CPU I'm currently running.