39480 – Failure to simplify -min(-a, a) => max(-a, a)

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 39480 - Failure to simplify -min(-a, a) => max(-a, a)

Summary: Failure to simplify -min(-a, a) => max(-a, a)

Status:	NEW

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Common Code Generator Code (show other bugs)
Version:	trunk
Hardware:	PC Windows NT

Importance:	P enhancement
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2018-10-29 07:26 PDT by Simon Pilgrim
Modified:	2021-06-23 08:43 PDT (History)
CC List:	6 users (show)

See Also:	46915
Fixed By Commit(s):	664e1784cd5 5060f5682b0 8878b79cfe9 38852076515 01a6c4bd26a 59387c0dd74 16c642fa39d 0e1241a3c98 132be1f5027 141357663e6 a512c894768

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Simon Pilgrim 2018-10-29 07:26:02 PDT

#include <algorithm>
#include <x86intrin.h>

int min_neg_s32(int a)
{
    return -std::min(-a, a);
}

__m128i min_neg_v4s32(__m128i a) {
    return _mm_sub_epi32(_mm_setzero_si128(), _mm_min_epi32(_mm_sub_epi32(_mm_setzero_si128(), a), a));
}

float min_neg_f32(float a)
{
    return -std::min(-a, a);
}

https://godbolt.org/z/h4zTcS

clang: 

_Z11min_neg_s32i: # @_Z11min_neg_s32i
  movl %edi, %eax
  negl %eax
  cmpl %edi, %eax
  cmovgl %edi, %eax
  negl %eax
  retq
_Z13min_neg_v4s32Dv2_x: # @_Z13min_neg_v4s32Dv2_x
  vpxor %xmm1, %xmm1, %xmm1
  vpsubd %xmm0, %xmm1, %xmm2
  vpminsd %xmm0, %xmm2, %xmm0
  vpsubd %xmm0, %xmm1, %xmm0
  retq
_Z11min_neg_f32f: # @_Z11min_neg_f32f
  vmovaps .LCPI2_0(%rip), %xmm1 # xmm1 = [-0,-0,-0,-0]
  vxorps %xmm1, %xmm0, %xmm2
  vminss %xmm2, %xmm0, %xmm0
  vxorps %xmm1, %xmm0, %xmm0
  retq

gcc manages the scalar cases (s32 is interesting....) but not the vector case:

_Z11min_neg_s32i:
  movl %edi, %eax
  cltd
  xorl %edx, %eax
  subl %edx, %eax
  ret
_Z13min_neg_v4s32Dv2_x:
  vpxor %xmm1, %xmm1, %xmm1
  vpsubd %xmm0, %xmm1, %xmm2
  vpminsd %xmm0, %xmm2, %xmm0
  vpsubd %xmm0, %xmm1, %xmm0
  ret
_Z11min_neg_f32f:
  vmovss %xmm0, %xmm0, %xmm1
  vxorps .LC0(%rip), %xmm0, %xmm0
  vmaxss %xmm1, %xmm0, %xmm0
  ret

Comment 1 Sanjay Patel 2018-10-29 07:50:28 PDT

(In reply to Simon Pilgrim from comment #0)
> int min_neg_s32(int a)
> {
>     return -std::min(-a, a);
> }

That's a hard way to write 'std::abs':
https://rise4fun.com/Alive/Cha

 %negx = sub i32 0, %x
 %cmp = icmp sgt i32 %negx, %x
 %min = select i1 %cmp, i32 %x, i32 %negx
 %r = sub i32 0, %min
=>
 %cmp2 = icmp slt i32 %x, 0
 %r = select i1 %cmp2, i32 %negx, i32 %x

Comment 2 Tillmann Karras 2019-11-06 04:28:40 PST

-min(-x, x) can be generalized to -min(-x, y) => max(x, -y) (https://rise4fun.com/Alive/FBpl) but requires nsw at least on the min negation. The opposite case only works for y = x: -max(-x, x) => max(-x, x) (https://rise4fun.com/Alive/FDjL) but works for any combination of nsw flags.

Comment 3 Andrea Di Biagio 2020-02-07 09:50:38 PST

Not sure about the nsw/nuw flags, but according to Alive this is correct too:

; -min(-x, -y) => max(x, y)

%negx = sub nsw 0, %x
%negy = sub nsw 0, %y
%cmp1 = icmp slt %negx, %negy
%sel1 = select %cmp1, %negx, %negy
%r = sub nsw 0, %sel1
  =>
%cmp2 = icmp sgt %x, %y
%r = select %cmp2, %x, %y


I've noticed this while looking at the codegen of functions in llvm/test/CodeGen/X86/vector-reduce-umax.ll

Specifically, the sequence:

; X64-SSE2-NEXT:    movdqa {{.*#+}} xmm2 = [32768,32768,32768,32768,32768,32768,32768,32768]
; X64-SSE2-NEXT:    pxor %xmm2, %xmm0
; X64-SSE2-NEXT:    pxor %xmm2, %xmm1
; X64-SSE2-NEXT:    pminsw %xmm0, %xmm1
; X64-SSE2-NEXT:    movdqa %xmm1, %xmm0
; X64-SSE2-NEXT:    pxor %xmm2, %xmm0


Which is basically doing a ` - MIN (-A, -B)`

And could be rewritten as a simple ` MAX(A, B)`:

; X64-SSE2-NEXT:    pmaxsw %xmm0, %xmm1

Comment 4 Andrea Di Biagio 2020-02-07 09:58:48 PST

Acutally sorry, the original code from that test was trying to do float negations (doing a xor of the sign bit).

Anyway, this is the Alive link: https://rise4fun.com/Alive/5Y7

Comment 5 Tillmann Karras 2020-02-07 12:39:12 PST

Note that

-min(-x, y) => max(x, -y)

is more general than

-min(-x, -y) => max(x, y)

it just requires an extra -(-y) => y step.

Comment 6 Roman Lebedev 2020-08-05 14:24:05 PDT

Nowadays with the help of Negator we get:

define dso_local i32 @_Z11min_neg_s32i(i32 %0) local_unnamed_addr #0 {
  %2 = sub nsw i32 0, %0
  %3 = icmp sgt i32 %2, %0
  %4 = select i1 %3, i32 %2, i32 %0
  ret i32 %4
}

So i think the transform we are missing is 

Name: (-x) s> x  -->  x s< 0 
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp sgt i8 %neg_x, %x
  =>
%r = icmp slt i8 %x, 0

Name: (-x) s>= x  -->  x s< 1 
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp sge i8 %neg_x, %x
  =>
%r = icmp slt i8 %x, 1

+ commutative variants. I'm not seeing a generalization for `(-x) s?? y` case.

https://rise4fun.com/Alive/moSi

Ignoring the question of intrinsics, i can take a look

Comment 7 Roman Lebedev 2020-08-06 01:55:06 PDT

Added folds for `(-NSW x) pred x` --> `x pred' 0` folds.
We now seem to handle the original (integer!) case well:
https://godbolt.org/z/qPPn97

Comment 8 Roman Lebedev 2020-08-06 02:02:39 PDT

(In reply to Roman Lebedev from comment #7)
> Added folds for `(-NSW x) pred x` --> `x pred' 0` folds.
> We now seem to handle the original (integer!) case well:
> https://godbolt.org/z/qPPn97

... but don't handle the vector case, since it lacks NSW.
I think it should be done after canonicalization to the new intrinsics.

Comment 9 Roman Lebedev 2021-05-13 04:12:05 PDT

Hm, i lost track here. Let me revisit this...

Comment 10 Sanjay Patel 2021-05-13 05:13:26 PDT

(In reply to Roman Lebedev from comment #9)
> Hm, i lost track here. Let me revisit this...

There are multiple examples here, so I'm not sure what's left either.

On the FP side, we probably just need to hoist an fneg through a select (fneg is a unary op, so it misses the binop transforms we would try with a select operand):
https://alive2.llvm.org/ce/z/7FKWLP

We want to make sure that FMF get propagated through those transforms, so it needs a pile of tests. Let me know if I should work on that part.

Comment 11 Roman Lebedev 2021-05-13 10:57:34 PDT

(In reply to Tillmann Karras from comment #5)
> Note that
> 
> -min(-x, y) => max(x, -y)
> 
> is more general than
> 
> -min(-x, -y) => max(x, y)
> 
> it just requires an extra -(-y) => y step.

And once again, i'm not sure what specific transform we are missing?
> -min(-x, y) => max(x, -y)
... seems to not be valid:
https://alive2.llvm.org/ce/z/5grsyy

(In reply to Sanjay Patel from comment #10)
> (In reply to Roman Lebedev from comment #9)
> > Hm, i lost track here. Let me revisit this...
> 
> There are multiple examples here, so I'm not sure what's left either.
> 
> On the FP side, we probably just need to hoist an fneg through a select
> (fneg is a unary op, so it misses the binop transforms we would try with a
> select operand):
> https://alive2.llvm.org/ce/z/7FKWLP
> 
> We want to make sure that FMF get propagated through those transforms, so it
> needs a pile of tests. Let me know if I should work on that part.

Comment 12 Roman Lebedev 2021-05-13 10:58:25 PDT

Posted too soon.

@Sanjay yes, i don't intend to touch FP side of things, feel free to.

Comment 13 Roman Lebedev 2021-05-13 12:13:13 PDT

Ok, i'm seeing one missed Negator pattern:
-({u,s}{min,max}(-x, x)) -> -({u,s}{max,min}(-x, x))
https://alive2.llvm.org/ce/z/xyP9U2 
https://alive2.llvm.org/ce/z/uDVZGV
https://alive2.llvm.org/ce/z/wfMJ4-
https://alive2.llvm.org/ce/z/vrrc-5

Comment 14 Sanjay Patel 2021-05-17 12:26:37 PDT

FP neg fold:
https://reviews.llvm.org/rG3cdd05e519dd

I think that gives us the optimal code for the example in the original description:

float min_neg_f32(float a)
{
    return -std::min(-a, a);
}

-->
	vxorps	LCPI2_0(%rip), %xmm0, %xmm1
	vmaxss	%xmm0, %xmm1, %xmm0


This isn't fabs because of -0.0, but we miss that too even with -ffast-math, so that's another bug...

Comment 15 Sanjay Patel 2021-06-23 08:43:34 PDT

Filled in another missing FP min/max transform:
https://reviews.llvm.org/rG1e9b6b89a7b5